At the end of last year, I got a bit more serious about monitoring my own systems, and the effort was well worth it. Yet, there were some parts of the system I had no metrics or information about: for example, there are services running on my systems that get restarted from time to time, which is ok, but I'd still like to know when and how often that happens, so I can judge whether that matches my expectations. Then, while there, services can also crash - and that, too, happens from time to t...