What are compiler optimized internal PHP functions and should you import them via use statement?

Every once in a while when browsing through open-source code, you will probably have come across internal functions that are either imported implicitly with use function array_map; like here in Doctrine or prefixed with the global namespace separator, for example \is_string($foo) like in Symfony.

Curious beings as we are, you might wonder as did I: Why are they doing this? Do function calls not automatically fall back into the global namespace?

Yes they do, you don’t technically need to use function or prefix with the global namespace separator.

But there is another technical reason why open-source libraries do this: A small number of PHP internal functions has compiler optimized versions that avoid a lot of the internal overhead of an internal function call in the PHP engine. You can find these functions in the zend_compile.c file of PHP.

You might raise the point that this sounds like a micro-optimization and you are correct.

For open-source code you can argue that it should be a matter of good style to provide the lowest overhead possible.

For your own application code, this optimization is less important but there are cases where this optimization does matter as well: When you call these compiler optimized functions hundreds of thousands of times and your application has low response times in the 20-200ms range.

Granted, these cases are rare – but its good to having heard about this optimization nonetheless.

Should we stop here? No! Two topics are interesting to look into with compiler optimized functions:

  • Benefiting from the optimization by relying on code formatting automation
  • Can we measure compiler optimized functions in a Profiler and how would that look like?

Automatically import compiler optimized functions using available tools

When you are using a modern auto-formatting tool such as php-cs-fixer or phpcs with phpcbf already, then importing these optimized functions doesn’t even require a lot of work and can be fully automated. If that is the case, you can benefit from this optimization at nearly zero costs, you don’t even have to know or think about it.

For php-cs-fixer there is the native_function_invocation rule with the default configuration setting @compiler_optimized:

php php-cs-fixer-v3.phar fix src/ --allow-risky=yes --rules=native_function_invocation

For phpcs you can use the SlevomatCodingStandard.Namespaces.ReferenceUsedNamesOnly sniff from slevomat/coding-standard, but be aware that it does quite a bit more than just importing compiler optimized internal functions.

<!-- phpcs.xml -->
<ruleset>
    <rule ref="SlevomatCodingStandard.Namespaces.ReferenceUsedNamesOnly">
</ruleset>

And then run:

php vendor/bin/phpcbf

How Profilers see compiler optimized functions

There is something to learn about PHP engine and how profilers work with compiler optimized functions. By implementing the short cut for these few functions in the engine, they also circumvent the hooks that profilers use to detect track the execution.

Take this simple bit of code that re-implements a small variation of “str_repeat” in userland code and uses strlen doing so:

<?php

namespace MyStandardLib;

function my_own_str_repeat($chars, $length)
{
    $new = '';
    for ($i = 0; strlen($new) < $length; $i++) {
	    $new .= $chars;
    }
    return $new;
}

$str = my_own_str_repeat("x", 100000);
echo strlen($str) . "\n";

If we run this through Tideways Callgraph Profiler, Xdebug or any other Profiler you can spot the strlen functions being executed 100.000 times. Because of the missing import from the global namespace, the engine cannot use the compiler optimized version, since a user defined strlen might exist in the MyStandardLib namespace.

See how Tideways already gives you a hint about a potential optimization here, only because it would make an impact on the bottom line of this script, measured at 24% of total execution time.

As a side note: If you are wondering about the shart drop from 7,24ms to 1,81ms, this must be attributed to the profiling overhead that I have written about elsewhere, caused by an otherwise quick internal function called an excessive number of times.

As soon as you add a leading namespace slash to import the function the Profiler does not observe the execution anymore. It looks like “my_own_str_repeat” is a function that has no child calls. In its 1,24ms execution somewhere the optimized 100.000 strlen functions are hidden:

The Profiler will not be able to detect that strlen is even called anymore, on the account of the PHP engine’s shortcut.

So should you import functions that the compiler can optimize for performance reasons? The answer is: It depends. When the overhead is significant then yes, otherwise you could ignore this as a being too micro an optimization.

Benjamin Benjamin 28.02.2022