Benjamin Benjamin 09.05.2016

PHP Session Garbage Collection: The unknown performance bottleneck

Here is one performance setting in your PHP configuration you probably haven't thought about much before: How often does PHP perform random garbage collection of outdated session data in your application? Did you know that because of the shared nothing architecture PHP randomly cleans old session data whenever session_start() is called? An operation that is not necessarily cheap.

By default this happens every 100th request, because of the following php.ini variables settings session.gc_probability=1 and session.gc_divisor=100. But you can not just look at the defaults of these settings in your application to see what is going on. Some distributions change the cleanup mechanism, for example Debian/Ubuntu flavored PHP versions set session.gc_probability=0 and use a cronjob to cleanup old session data. However frameworks then usually overwrite everything anyways and force a reset of the INI settings themselves (Symfony) or implement their own session entirely (Laravel).

My general advice is to avoid PHPs random garbage collection altogether and offload the cleanup to either background jobs or the system used for storage. Be careful about this "general advice", the implementation highly depends on the individual storage system and type of session save handler.

Example: Laravel Framework

Lets look at the Laravel framework as an example, where I find a lot of performance advice is about changing the session driver. By default file based sessions are used in Laravel with a configured garbage collection ratio of every 50th request.

When Laravel triggers Session garbage collection the following code is called for the file-based session:

public function gc($lifetime)
{
    $files = Finder::create()
                ->in($this->path)
                ->files()
                ->ignoreDotFiles(true)
                ->date('<= now - '.$lifetime.' seconds');
    foreach ($files as $file) {
        $this->files->delete($file->getRealPath());
    }
}

If you know a little about Symfony Finder component, then you know that it is comparatively slow operation instead of directly using Linux commands such as the special purpose bash script sessionclean that Debian/Ubuntu ship with.

The problem: With more active users and therefore more active sessions in your application session cleanup gets slower because it has to iterate over more files. And when you have this much traffic on your application, then cleaning every 50th request can happen multiple times every second. If you are measuring performance using percentiles (which you should), then the 99% percentile will always negatively affected by one or two requests that cleaned up the session.

Take a look at this Tideways trace of our own status page (provided by Cachet) where the Laravel Garbage Collection is triggered to cleanup a folder of 10.000 file-based sessions:

As you can see this takes the majority of the whole request (a whooping 723ms of 837ms, 87%!). Even if you have less active session, this is still very high performance penalty given it happens every 50th request.

Offload Session Garbage Collection

Instead of slowing down a random sample of your users requests with session garbage collection, you should offload this to a cronjob that is called a single time in regular intervals.

For Laravel this means changing the lottery configuration in config/session.php to:

<?php

return [
    // ...
    'lottery' => [0, 100],
];

You then need a cronjob that calls ./artisan session:gc with the following command code:

<?php
// app/Console/Commands/SessionGcCommand.php

namespace App\Console\Commands;

use Illuminate\Console\Command;
use Illuminate\Support\Arr;

class SessionGcCommand extends Command
{
    protected $signature = 'session:gc';

    public function handle()
    {
        $session = $this->getLaravel()->make('session');
        $lifetime = Arr::get($session->getSessionConfig(), 'lifetime') * 60;
        $session->getHandler()->gc($lifetime);
    }
}

Don't forget to register this in app/Console/Kernel.php.

If you absolutely have to use PHP based cleanup, then adjust the probability based on your applications traffic to be triggered every 5-10 minutes by increasing the second number of the lottery. For example if you serve 1000 requests per minute, then the right setting could be 1 out of 1000 requests.

<?php

return [
    // ...
    'lottery' => [1, 1000],
];

The same logic applies to Symfony, where the native session is configured to cleanup every 100th request by default. You can disable random cleanup during PHP requests with the Native session handler by setting the following configuration:

framework:
    session:
        gc_probability: 0
        gc_divisor: 100

The Symfony garbage collection command to be called with ./app/console session:gc from a cronjob would look like this:

<?php

namespace AppBundle\Command;

use Symfony\Component\Console\Input\InputInterface;
use Symfony\Component\Console\Output\OutputInterface;
use Symfony\Bundle\FrameworkBundle\Command\ContainerAwareCommand;

class SessionGcCommand extends ContainerAwareCommand
{
    protected function configure()
    {
        $this->setName('session:gc');
    }

    protected function execute(InputInterface $input, OutputInterface $output)
    {
        $session = $this->getContainer()->get('session');
        $session->start();

        $storage = $this->getContainer()->get('session.storage');
        // if you configure "framework.sessions.gc_maxlifetime" it will set the ini var
        $storage->getSaveHandler()->gc(ini_get('session.gc_maxlifetime'));
    }
}

What about non file-based session storages?

If you store sessions in the database, then you still need the cleanup via cronjob as described in the previous section. The session tables in Laravel and Symfony, and probably in every other framework contain some form of lifetime timestamp that can be used to delete old sessions.

For cache based session handlers based on Memcache or Redis you can rely on the cache time-to-live (TTL) instead, which makes explicit garbage collection obsolte. Symfony and Laravels implementations of these caches have a no-op garbage collection method, calling it will do nothing.

This is the reason why even with a session.gc_probability set to a high value, the cache based drivers never cause any performance overhead in PHP requests and users might be mislead by this when performing benchmarks of different session configurations.

Conclusion

Take some time to learn about your frameworks approach to session garbage collection and if you don't use a framework, see the PHP documentation on how native session GC works.