Not all graphs are linear

| No Comments

Chris Josephes makes an excellent point about disk usage alerts:

[T]he monitoring system should compare the standard deviation for the file-system percentage over the past 24 hours, and compare it to the standard deviation for the past hour.

It isn’t (just) about what percentage of the file-system is full/free, but how fast it’s filling up. A disk that has sat at 20% full for a year, then creeps up to 50% full in 24 hours …. is that something worth pointing out?

So what else, aside disk, might you want more intelligent monitoring on?

Bandwidth perhaps? Most busy websites have a daily/weekly cycle of bandwidth that tends to look a bit like a rollercoaster ride. But if we’re more than 10% over the typical peak for the current hour - something worth pointing out?

What about memory/RAM? Install or upgrade a app that introduces a memory leak. Wouldn’t it be nice if you got told about a sudden change in the memory footprint of a server before you get some very weird “out of memory” errors?

Sure, this isn’t going to work all the time, or perhaps even often. But some metrics do tend to be fairly stable or predictable/cyclic on some machines. It really ought to be possible to identify these automatically. You might not want a page/sms/alert/call at 3am about it, but by definition this is about getting warnings sooner.

We’re curious about what else might benefit from more intelligent monitoring like this. Suggestions, ideas or pointers would be most welcome in the comments or drop us an email

Leave a comment

About this Entry

This page contains a single entry by snork published on May 10, 2008 9:49 PM.

Hello world! was the previous entry in this blog.

SMS / Pager / Text alerts is the next entry in this blog.

Find recent content on the main index or look in the archives to find all content.