Efficient collection of profiling data

One of the first challenges when collecting profiling data from PHP requests regularly is the question of efficency. One central requirement we faced for the application performance monitoring is minimal overhead.

There are two dimensions to this question, one is the aspect of transmitting and storing the other is about the optimal sampling rate. This blog post will discuss the first aspect and the second one is left for another blog post.

With the requirement of minimal overhead in mind, directly transmitting profiling data to the profiler REST API disqualified immediately.

Direct transmission without degrading user performance might work with UDP, however xhprof profile traces can easily amount to hundreds of kilobytes, way too big for UDP packages. Sending data to a remote server via UDP will also very likely lead to a significant amount of missing packages.

Sending via HTTP (TCP) would degrade performance significantly. By default I/O is blocking in PHP. Sending several KB to another server around the world can easily translate into a poor performance for your end-users. Non-blocking I/O however is not trivial to use in PHP and in our case useless: sending the profiling data is done as the last operation in the request with no other work to parallelize.

We needed a way to separate storing the profiling data before sending it to the central platform and decided to implement a daemon, acting as a proxy between all your application PHP requests and the Profiler REST API. It is written in Go to allow for very high concurrency and efficient asynchronous I/O.

The daemon accepts data over two different network interfaces, UDP for the permanent request performance measurements and Unix Socket for profiling data. Given the daemon runs on a local machine, the socket timeouts can be set to very low numbers of milliseconds, avoiding to block PHP requests even more.

Efficency when transmitting and storing data is also important. The daemon aggregates the performance data on a minute by minute basis, avoiding a constant stream of one dedicated http request for every request your application serves. It also discards profiles that don’t add value for your analysis, for example when collecting profile traces with average wall times for the high traffic frontpage in small second intervals.

This serves two goals:

Not transmitting huge amounts of traffic, potentially costing money.
Preventing the daemons from accidently launching a distributed denial of service attack on the Profiler REST API when traffic starts to increase.

The current process works, however, there is still a lot of room for improvement. It will be one of the tasks during the closed beta to add more logic with regard to filtering and aggregation. Our early access customers already provide us with very differentiated usage patterns to improve the situation. We are preparing to add more beta testers in a few days to get more feedback on the process.

Toby 03.07.2014