Four Logging Best Practices for Production Applications

BONUS: We have discussed this topic with an expert in the PHP community in our podcast:

Logging is an essential part of just about any PHP-based application; whether in a script or a larger application. However, how little is too little and how much is too much to log?

If we don’t log enough information, when something goes wrong, as it invariably does, then we won’t have enough information available to determine what went wrong so that we can fix the problem. However, if we have too much information, then we’ll be unable to filter out the white noise.

In this post, I’m going to step through four best practices for logging in production, practices that you can apply to ensure you have the right amount of information as and when you need it.

Know The Right Log Level to Use

If you’ve spent any amount of time around logging libraries, you’ll know that they support several different logging levels. Most of them follow, to one extent or another, RFC 5424. And, if you’ve had a chance to review PSR-3: Logger Interface, then you’ll know that there are eight levels.

However, you may not know when to use each level. If so, here’s a broad overview of when to use each level.

Emergency: This level is for when the system is unusable.
Alert: This level is for when action must be taken immediately.
Critical: This level is for reporting conditions that critically affect the application.
Error: This level is for problems that prevent your application from continuing normally. These types of problems can include the database being down or inaccessible.
Warning: Use this level for alerts about something going wrong, but the application can continue functioning. This might include socket timeouts or retries, and invalid logins attempts.
Notice: Use this level for alerts about issues that start to indicate something is going wrong with your application, yet which aren’t too significant.
Info: Use this level for general messages, but not for debugging information.
Debug: Use this level for tracing information.

Know What To Write In a Log Message

The previous section gave a broad overview of when to use each log level. However, what do you write in a log message?

You don’t want the message to be too extensive. You don’t want to be too brief, either.

Splunk has an excellent set of suggestions. Of those suggestions, I’ve found the following five to be the most pertinent:

Create events that humans can read: Make sure that whatever it is that you’re logging, that it’s intelligible. If the information needs to be stored in a binary format, because it’s too extensive, then ensure that tools are available that can decode that format.
Use timestamps: It’s essential to know when an event occurred, especially as you’re going to be reading many log entries at a time to track down the source of the error. Make sure that log messages can be read in the correct order so that they don’t exacerbate the problem.
Use unique identifiers (IDs): Unique identifiers can help information be filtered better than timestamps alone. They can help users filter by an application component, node, section, or other attributes.
Use developer-friendly formats: There are a plethora of formats that developers are commonly familiar with, that they know how to parse, and are well suited to storing information in a structured way.
Identify the source: Where did the error originate? Ensure that you identify the source of the error, such as the class, function, and file.

Centralize Logging

No matter the size of your application, logs should be stored in a centrally available location. If not, then you run the risk of not having all the relevant information at hand when something goes wrong.

If this seems strange, consider the following scenario. Your application is quite successful so has been scaled horizontally, with multiple nodes available to handle requests which sit behind a load balancer.

If the code is written such that log entries are written to the node that the request is handled by, what happens when one or nodes become unavailable or inaccessible? Alternatively, when an error occurs:

How long will it take to retrieve all of the log information from each node in the cluster?
How will you know on which node the error occurred?

To avoid situations such as these, I strongly advocate for centralized logging. You can use a dedicated, internal, logging server or a commercial logging solution, such as Loggly, Papertrail, or Amazon Elasticsearch Service.

It should be relatively trivial to either log to a centralized location, or to the operating system’s log service and configure it to forward the log messages out to a central location. This should happen regardless of the logging solution, the logging library (such as Monolog), and whether you’re deploying to a Linux or Windows server.

For example, you could log to Rsyslog on Linux which forwards those log messages to a central log server. On Windows, you could configure it to forward log messages to a central collector service.

Regardless of whether the central server is using an open-source tool such as ELK or a commercial service such as Loggly, the application’s logs can be centrally accessed and backed up, in case of some form of a natural disaster.

Understand the Difference Between Domain Messages and Framework Messages

When you’re logging, make sure that you distinguish between these two types of messages. Domain messages relate to the events within the business logic of your application. Framework (and library) messages are those that relate to libraries, frameworks, and other low-level code which your application uses.

Here’s an example of a domain message. It shows a domain “event” that was triggered within the application.

$logger->info("User $userId with $email registered for newsletter $name");

And here’s an example of a framework message. It shows the matched route when the given URL was requested, and which controller was dispatched to handle it.

$logger->info("Route $foo matched URL $url: Executing controller $controllerClass");

With that said, in production it might make most sense to have two logger services:

One for domain messages and one for framework messages, which would log up to INFO level; and
One for framework/library messages, which would log up to either WARNING or ERROR.

Here’s a small, for example purposes only, snippet on how to configure two different Monolog logger instances, one for domain messages, and one for framework messages.

// Create the logger for domain messages $domainLogger = new Logger('domain_logger');

// Create the logger for framework messages $frameworkLogger = new Logger('framework_logger');

// Now add some handlers, which log up to a certain log-level $domainLogger->pushHandler(new StreamHandler(__DIR__.'/domain.log', Logger::INFO)); $frameworkLogger->pushHandler(new StreamHandler(__DIR__.'/framework.log', Logger::ERROR));

Conclusion

And that’s four logging best practices that you can use when developing your PHP applications. From the right log level to use to centralizing logging, each should help you store more meaningful information in the right way.

Benjamin 04.02.2020