How does the PHP Realpath Cache work and how to configure it?

The realpath cache in PHP is often overlooked and its exact workings are a bit of a mystery to many developers, fueled by a lot of explanations on the web that are just plain wrong.

How exactly is the realpath cache working and at which level of PHP? There has been some buzz around realpath cache in the last weeks, so it is a hot topic to look into.

First, The PHP 7.2 changelogs contain a small note about the realpath cache size, raising the question what about the previous default in pre 7.2 deployments and if you should look into changing it:

realpath_cache_size: Set to 4096k by default

The previous default was just 16k, a 256x increase is significant. This change was also backported to PHP 7.0 and PHP 7.1 patch releases in January.

Second, the blog post “Is it all Opcache’s fault?” on the Engineering blog identifies the realpath cache as a potential problem during symlink deployments. Knowing about this issue is important when building your own deploy strategy.

Time to dive into the realpath cache, how it works and how to configure it once and for all.

If you are fluent in C you find all the answers in Zend/zend_virtual_cwd.h and Zend/zend_virtual_cwd.c files in php-src, in this blog post I am trying to explain this in plain (technical) English.

First myth buster, the realpath cache is not actually used just by the PHP realpath function. It triggers in many of the filesystem functions when using the file:// stream wrapper. The most important ones being fopen, file_get_contents, is_file, is_dir, require, require_once, include and include_once.

Whenever these functions are called, PHP makes a lookup in the realpath cache for an entry that could look like this when retrieved from realpath_cache_get():

'/var/www/wordpress/wp-includes/class-walker-category.php' =>    array (size=4)       'key' => int 6963295217931825180       'is_dir' => boolean false       'realpath' => string '/var/www/wordpress/wp-includes/class-walker-category.php' (length=56)       'expires' => int 1507388147 

You can see this cache helps with finding out the realpath, directory or file status of a file even when the accessed file is not a symlink (meaning filename and realpath are the same). It makes sense to always store entries, even if the file is not a symlink, because we can avoid a file I/O call this way.

Configuration of Realpath Cache

So how can you configure the realpath cache? Two options are available:

  1. How long is an entry stored in the cache? (realpath_cache_ttl)
  2. How many entries can be stored in the cache using a maximum number of bytes, not a maximum number of entries value. (realpath_cache_size)

The time to live defaults to 120 seconds. any change to a symlink might be invisible to a PHP process for this amount of time. A cache hit does not extend the time to live, so it doesn’t make sense to set the realpath_cache_ttl to just a few seconds even on servers under constant load. I think the default is OK and doesn’t need change.

The cache size in bytes can be confusing to understand, because the realpath cache is stored on the process level not in shared memory like Opcache.

A simple calculation explains this: Are you using PHP-FPM or Apache/mod_php with a worker size of 100? Then the realpath cache of 4096K can lead to a total cache memory size of 100 * 4 MB. The memory used by the realpath cache is not pre-allocated, that means the memory requirement is not automatically adding 4 MB for every PHP process, it will just use the memory it needs up to the maximum size.

Be careful: Because the cache does not use the PHP memory managers API, you will not see the realpath cache memory being used in memory_get_peak_usage();. You can see the current size of the realpath cache for the process currently executing with the function realpath_cache_size();.

Also, all realpath cache realted functions only work on the current process: realpath_cache_get(), realpath_cache_size() or even clearstatcache() when used with the realpath flag.

How many items can the cache hold? It depends on the length of the path begin cached. If the realpath is different than the path, then it stores two filepaths strings, otherwise just one.

For our previous wordpress entry example, we can calculate metadata size (56 bytes) plus the string length of 56 bytes = 112 bytes, which would allow the cache to hold around 37.000 items with a cache size of 4096K, but only about a hundred entries for a cache size of 16KB.

If we would use symlinks (as certain deployment strategies do, more on that in the post), then the cache can hold half the number of items, around 18500 (assuming path length is roughly the same as realpath length).

So 4096K is usually more then enough, even for applications using frameworks with a lot of files, like Symfony, Zend Framework, Laravel, Magento and so on. 16K however is much too small.

Rule of Thumb: Always have a realpath cache that can hold entries for all your files in memory. If you use symlink deployment, then make it double or triple the amount of files.

Next week I will write about realpath cache pitfalls, if you want to be in the loop you can subscribe to our newsletter on PHP performance right below.

Benjamin Benjamin 11.10.2017