Combining regular expressions with named capture groups to improve performance

Regular expressions are usually not top of mind when you think of performance bottlenecks in your application, but inefficient patterns or sub-optimal use of the APIs can make them slow.

A wonderful learning opportunity for this is a recent roughly factor of 2x performance improvement patch to Doctrine DBAL’s SQL Parser. This patch was submitted by Soner from Shopware fame, thank you to him.

Doctrine DBAL uses the SQL Parser to expand a single prepared statement parameter to a list of values, for example for a WHERE id IN (?) query.

SQL Parser Before the Patch

Before version 3.10 of Doctrine DBAL, the SQL parser used to iterate over 4 different regular expressions, and when one of them matched, the corresponding code was called. This was coded as an array of regular expression mapping to a closure:

These patterns where checked in a loop and whenever one of them matched, they were iterated over again from the beginning:

In a callgraph we can see that for 12 calls to Parser::parse, we have hundreds of calls to preg_match, key, current, next and reset and closures. Tideways has custom instrumentation for preg_match calls and shows a few starting characters of each regular expression, so that you can identify their performance individually.

A lot of potential for improvement there that this pull request captured.

The Patch

The patch rewrites the four regular expressions into a single one, using named capture groups to differentiate which one was matched:

The syntax ?P<name> creates a named capture group that is then available in the $matches result array as a key. Its an empty string if it did not match, and a non-empty string if it matched.

This change removes the layer of indirection with the closures that call methods on the Visitor and cuts down on function calls for array traversal next, key, current and reset.

Measuring the Impact: 1,7x to 2,43x faster

When benchmarking the test-script from the PR with hyperfine the results speak for themselves, an improvement by a factor of 2,43x:

And comparing two callgraph profiles from before and after the change, we also see a 10ms drop from 24ms for an improvement by a factor of 1,7x.

You can also see how the closure calls and individual preg_match‘s are replaced by the single call to preg_match in the comparison of child functions.

The individual improvement depends on how often Parser::parse is called and how long the individual queries are that are parsed. The more characters a query has, the more significant the improvement becomes.

Combining regular expressions into one big expression is a common performance improvement, and libraries such as nikic/fastroute, symfony/routing or JayBizzle/crawler-detect make good use of it as well.

Got a tingly feeling for more insights on PHP performance, operations and debugging topics? Sign up for our newsletter.

Suffering from a slow web application and scratching your head as to why? Our PHP Profiler can help you out. Start a 14 day trial to get effortless performance insights from us, tailored to your application problems.

Benjamin Benjamin 30.04.2025

Do you prefer video over text? You can watch my video on Understanding Regex Optimizations in Doctrine DBAL 3.10 over at YouTube.

Understanding Regex Optimizations in Doctrine DBAL 3.10