Benjamin Benjamin 04.09.2015

High Performance Shopware 5.1 with Elasticsearch

The upcoming Shopware 5.1 release ships with native Elasticsearch support, complementing the MySQL backend. This new Elasticsearch support does not replace MySQL as a primary storage, but works as a cache in front of the slow search and category listing database operations.

On the occasion of the 5.1 Release Candidate 1 released at the Shopware Community Day this Friday we are taking the new Elasticsearch backend for a test-drive.

This change is the ideal study of performance impact due to architectural changes to an existing system. Architectural changes usually have much higher impact on the system performance than micro-optimizations or smaller caching or query improvements, but they come at much greater development costs.

In the last years we see a clear trend towards separation of frontend and backend in e-commerce applications to achieve high-performance consumer facing sites. But Shopware and other popular shop-systems such as Magento and Oxid are monolithic systems, with high coupling to a single database (mostly MySQL). All these older shop systems now have a hard time to keep up with this new paradigm.

With this background and starting position it is impressive how the Shopware Core Team added a complex caching layer into this monolithic core, without loosing functionality. They effectively achieved seperation of frontend and backend storage inside the same application. To put the development effort into perspective, I have discussed the first concepts of a road towards Elasticsearch with the Shopware team in early 2014 as part of my consulting work with Qafoo. Architectural changes are never a "quick win".

To understand how Shopware works take a look at the high-level before and after architecture diagrams, moving from single datastore to cache-layer synchronized regularly with a cronjob.

We installed Shopware from Github using the 5.1 branch on a fresh 2 CPU/4 GB server at Digitalocean. For the setup of Elasticsearch with Shopware we followed the dedicated documentation page.

We started comparing the performance with the usual Shopware demo data set of roughly 200 products with no visible effect. Actually Elasticsearch is several times slower (19ms) as the 27 SQL queries it replaces (7ms), but the numbers are still negligible compared to the overall response times of around 400ms:

This highlights an important fact of architecture changes: They always depend on your context. With a small amount of products Shopware is actually fast out of the box and there is no need to complicate the setup by introducing a second caching data-storage.

Lets crank up the volume: We are using a random product generator to create a shop with 100.000 products in 100 different categories. Now the high-level and detailed comparison show Elasticsearch as the clear winner with 29ms, compared to the same category page with 3.9 seconds using MySQL:

The Elasticsearch cache replaces all the queries that don't scale well to high number of products with a single HTTP call.

We have experimented with different numbers of products and the story is always the same: Regardless of the number of products using Elasticsearch will yield query results between 20-40ms. The MySQL based search backend however gets exponentially slower with more articles, around 2-4 seconds for 10.000 products, 10-20 seconds with 100.000 products and starting to consistently fail at maximum execution time with 200.000 products and above.

With a high number of products in your Shopware shop, you should investigate if the performance gain using Elasticsearch is worth the overhead of maintainance. With Elasticsearch as a cache you can achieve almost constant performance in the search and category listings with Shopware, independent of the number products.

Constant performance regardless of database size is a great property for a system. To achieve this, a change to the architecture is required which introduces a more complex, distributed system. This introduces extra monitoring and maintenance costs. As always with software problems there are trade-offs to consider between the two approaches.