Logging

So, we're revamping "what to log and when to log it" at work, and that has me thinking. One of the consultants we've brought on spurred us to rewrite our policy, with a very simple remark:

"You not only need to log what went wrong, but tell the story of how the user got there - what they were doing - when the problem happened."

That sums it up, alright. The question is how to best do this.

I recently pushed for us to implement a quick and dirty solution to cause the log levels to become dynamic; they can be easily changed at runtime now. This makes very fine-grained logging much more useful; previously, it would overwhelm the logs, and restarting our system to change the logging level wasn't a trivial task. Now, we can turn on all the details for two or three minutes, cause the error again, and set the logging level to be much less verbose immediately afterwards, without a server restart.

With that ability under our belts, the next step was AOP - aspect oriented programming - with Spring. Whenever a user enters a method, AOP can silently log the entry, and the parameters to that method. Whenever a database query is fired, AOP can log it as well, with the parameters. Both of these went into the TRACE level initially.

We've added by hand code for DEBUG and WARN messages. Anything that throws an exception bubbles up - it's not recaught - and the user will see an error message, while the logs will see an ERROR.

The interesting thing for me was seeing that DEBUG and TRACE are slightly interchangeable, depending on whether you're doing development or production support. If you're doing production support, *where* the user was is at least as important as the parameters that method was passed; data that traces through the app becomes more important than normal. If you're developing - which is what log4j's log levels seem primarily aimed at - trace data is much less useful, as you generally already know exactly where program flow is taking you.