Combining regular expressions with named capture groups to improve performance
Regular expressions are usually not top of mind when you think of performance bottlenecks in your application, but inefficient patterns or sub-optimal use of the APIs can make them slow.
A wonderful learning opportunity for this is a recent roughly factor of 2x performance improvement patch to Doctrine DBAL’s SQL Parser. This patch was submitted by Soner from Shopware fame, thank you to him.
Doctrine DBAL uses the SQL Parser to expand a single prepared statement parameter to a list of values, for example for a WHERE id IN (?)
query.
SQL Parser Before the Patch
Before version 3.10 of Doctrine DBAL, the SQL parser used to iterate over 4 different regular expressions, and when one of them matched, the corresponding code was called. This was coded as an array of regular expression mapping to a closure:
These patterns where checked in a loop and whenever one of them matched, they were iterated over again from the beginning:
In a callgraph we can see that for 12 calls to Parser::parse
, we have hundreds of calls to preg_match
, key
, current
, next
and reset
and closures. Tideways has custom instrumentation for preg_match
calls and shows a few starting characters of each regular expression, so that you can identify their performance individually.

A lot of potential for improvement there that this pull request captured.
The Patch
The patch rewrites the four regular expressions into a single one, using named capture groups to differentiate which one was matched:
The syntax ?P<name>
creates a named capture group that is then available in the $matches
result array as a key. Its an empty string if it did not match, and a non-empty string if it matched.
This change removes the layer of indirection with the closures that call methods on the Visitor and cuts down on function calls for array traversal next, key, current and reset.
Measuring the Impact: 1,7x to 2,43x faster
When benchmarking the test-script from the PR with hyperfine the results speak for themselves, an improvement by a factor of 2,43x:

And comparing two callgraph profiles from before and after the change, we also see a 10ms drop from 24ms for an improvement by a factor of 1,7x.

You can also see how the closure calls and individual preg_match
‘s are replaced by the single call to preg_match
in the comparison of child functions.
The individual improvement depends on how often Parser::parse
is called and how long the individual queries are that are parsed. The more characters a query has, the more significant the improvement becomes.
Combining regular expressions into one big expression is a common performance improvement, and libraries such as nikic/fastroute, symfony/routing or JayBizzle/crawler-detect make good use of it as well.
Got a tingly feeling for more insights on PHP performance, operations and debugging topics? Sign up for our newsletter.
Suffering from a slow web application and scratching your head as to why? Our PHP Profiler can help you out. Start a 14 day trial to get effortless performance insights from us, tailored to your application problems.