Benchmarking WordPress (Investigating the Optimal Server)

With WordPress powering an estimated 16.7% of all websites1, having tools to evaluate server performance for running WordPress is becoming necessary. At the time of writing this article, there are no known published investigations into determining the optimal server configuration for WordPress. Yet, hosting providers will make bold claims, such as “My blog is 4x faster than your blog”. While these claims make for entertaining t-shirts, without a defined methodology there is no way to verify them.

Methodology

Rather than dive into the fine details, the load and WordPress setup for this experiment follows the WPMark Twenty Twelve guidelines. WPMark is a combination of a WordPress plugin and a JMeter test plan, more details on WPMark are forthcoming. For this particular investigation a Placket and Burman design experiment was used to find the hardware bottlenecks for WordPress.

A Placket and Burman design experiment involves picking reasonable high and low values for the parameters of interest and then systematically selecting between the high and low values in a series of experiments. For this setup, four easily varied parameters in the hardware configuration were identified. They are: CPU frequency, CPU core count, memory size, and non-volatile storage. See the table below for the high and low values selected for these parameters:

CPU frequency CPU cores Memory Size Non-volatile Storage
High 3.3GHz 4 8GiB Kingston HyperX 3K SATA 6Gb/s SSD
Low 1.6GHz 1 4GiB Seagate Momentus 5400.3 5400RPM SATA 1.5Gb/s HDD

To carry out the experiment, two systems using “Sandy Bridge” cored Intel processors were connected using a gigabit Ethernet switch to reduce network related bottlenecks. The slightly weaker system was used as the server to ensure that it could be adequately loaded without encountering capacity issues on the load generator.

Enthusiast grade hardware, rather than enterprise grade hardware, was used due to the increased flexibility in regards to enabling and disabling CPU features and the SATA 6Gb/s connectivity offered by consumer grade chipsets. The latter is important for the SSD utilized in the high value of non-volatile storage parameter. Additionally, the unlocked multiplier in the 2500K aided in under clocking, allowing multiple hardware configurations with minimal actual hardware replacement. See the lists below for the test server and load generator hardware configuration.

Test Server

  • Intel Core i5 2500k (Turbo Boost disabled)
  • 8GiB DDR3-1600C8 (dual channel, nominal)
  • 120GB SATA 6Gb/s HyperX 3K SSD (nominal)
  • Z68 chipset with Realtek GbE NIC
  • Linux 3.7.0-gentoo
  • Apache 2.4.3
  • PHP 5.4.7 (via mod_php)
  • MySQL 5.5.28

Load Generator

  • Intel Core i7 2600k
  • 16GiB DDR3-1600C8 (dual channel)
  • 120 GB Intel 320 SSD
  • Z68 chipset with Realtek GbE NIC
  • Windows 7 Pro (64bit)

The four identified easily varied parameters resulted in a design space of 16 possible combinations. A Plackett and Burman design, as described by Yi et al.2, was chosen to reduce this combination set to 8 combinations. In the table below, a ‘1’ indicates the high value was selected, while a ‘-1’ indicates the low value for the parameter was selected. After completing the tests the rank was determined for each of the eight test types using the following methodology:

The rank for a column is the result of to multiplying the response time for that row by the value contained in the corresponding cell location in the column of interest. For the table below, the CPU frequency rank was calculated using the equation:

Rank = (1 * 199) + (-1 * 210) + … + (1 * 396) + (-1 * 387) = 376

Run CPU frequency CPU cores Non-volatile Storage Memory Size Response Time (ms)
1 1 1 1 -1 199
2 -1 1 1 1 210
3 -1 -1 1 1 198
4 1 -1 -1 1 380
5 -1 1 1 -1 198
6 1 -1 1 -1 394
7 1 1 -1 1 396
8 -1 -1 -1 -1 387
Rank 376 -356 36 6

Results

Light Load

Main Page (uncached) Main Page (cached) Search Comment Average
CPU frequency 376 1 475 986 459.5
CPU cores 356 1 455 516 332
Non-volatile Storage 36 59 57 396 137
Memory Size 6 3 9 6 6

The table above contains the results from the single user test Plackett and Burman design experiment. From the results it is apparent that overall CPU frequency is the main bottleneck for WordPress in a single user environment. The number of CPU cores is almost as significant, and the non-volatile storage ends up being marginally important. For all four subtests, the memory size of the system had little to no impact on performance. This is not entirely surprising since a measure of memory usage using the UNIX top command while JMeter was running the WPMark Test Plan revealed that the system was using around 1.5GiB of memory during the single user test. Lower memory densities are becoming uncommon, making it difficult to stress the memory size with the available hardware setup.

In the individual tests, the main page (uncached) and the search tests both see CPU frequency and core count to be the two most important factors with the non-volatile storage left in a distant third. Comparing the two, search is a slightly more difficult task for the system (1.3X longer run times). The rank values for CPU frequency and core count match this, the rank value for non-volatile storage is nearly 1.6X higher in the search test, revealing that searching involves relatively more storage IO than displaying the main page.

The cached version of the main page only sees non-volatile storage as an important factor. This was expected since the main consumer of CPU resources is the scripting engine (PHP) and the cached version of the main page does not cause PHP to be invoked. From the table, comment submission really stresses the CPU, having both the CPU frequency and number of cores having very high ranks. Comment submission having a CPU frequency rank nearly twice that of CPU cores does make sense given what the test is doing. On the other tests a page is requested and loaded. However, on a comment submission, a comment is sent to the server, the server has to store it (database server writes to disk), and then send the page that was commented on. During that submitting time, there is only one HTTP request open with the server, so the other cores are not being (worked in the single user case). As soon as the page is sent and the other resources are requested (in parallel), the extra cores become beneficial just like on a regular request load.

Heavy Load

Main Page (uncached) (50 Users) Main Page (cached) (50 Users) Search (50 Users) Comment (50 Users) Average
CPU frequency 5034 29 6532 19677 7818
CPU cores 10746 453 13580 21161 11485
Non-volatile Storage 14816 195 19238 21031 13820
Memory Size 12552 561 15138 15133 10846

The table above contains the results from the 50 user test. Under heavy load, the bottlenecks shift a little. No longer is CPU frequency the overall leader in rank. Instead, non-volatile storage becomes the most significant bottleneck, with the number of CPU cores and memory size following. In this case, the server is being loaded very hard, the CPU load average according to the UNIX top command was near 50, 12.5X the maximum number of cores. With that much load, using a simple queue model for the system it is apparent that the request queue starts to fill, causing memory size to become significant along with response times to jump up significantly. If the system had more cores, the memory size sensitivity would be alleviated only slightly as the additional threads will need memory to do their job.

As to non-volatile storage becoming so significant, this has to do with what is happening with 50 simultaneous users requesting different resources. Even though there is a basic set of files to load, the access pattern of 50 users that are not exactly synchronized causes many different files to be requested at the same time. This is something traditional hard drives have problems with, especially is the data is not contiguous. Random access seek times in a traditional hard drive are on the order of 5 milliseconds, for SSDs this figure is around 100 microseconds. Utilizing an SSD with a high random access transfer rate, and low random access seek time will be beneficial in high load situations such as the 50 user test.

The individual tests for the 50 user test show quite different results from the single user test. In general they agree with the overall behavior seen with the 50 user test. The uncached main page, search, and comment tests are all bound primarily by non-volatile storage and the number of CPU cores. Only does the comment test really see a significance in increased CPU frequency. The reasoning here is the same as in the single user case, the comment test is more difficult and has a significant single thread component. The sheer number of requests, and thus threads working, masks this slightly.

The cached test presents an interesting situation. Rather than being bound by storage, the memory size has the highest rank, with the number of CPU cores following closely behind. This is likely due to not having enough CPU cores to handle all of the threads spawned by the requests. Thus, they back up into the queue, requiring more memory as they sit there waiting to be processed.

Conclusion

Conventional wisdom has stated that WordPress is database bound, such a bounding implies a mix of primarily IO and memory bottlenecks. However, in this investigation, it is clear that this is not the case. Using the WPMark benchmark, we found that while in some cases, especially for cached content, IO performance is significant, processor power is the most significant bottleneck for WordPress.

-John Havlik

[end of transmission, stay tuned]

Notes:

  1. Mullenweg, Matthew. “State of the Word 2012.” WordCamp San Francisco 2012. San Francisco .
  2. Yi, Joshua J.; Lilja, David J.; Hawkins, Douglas M.; ,“Improving Computer Architecture Simulation Methodology by Adding Statistical Rigor,” IEEE Transactions on Computers, Vol. 54, No. 11, pp. 1360-1373, .

Trackbacks/Pingbacks

  1. Pingback: WPMark Twenty Twelve | mtekk's Crib

4 thoughts on “Benchmarking WordPress (Investigating the Optimal Server)

    • Hi Thiago,

      While I had intended on releasing it quite a while ago, other things have come up and kept me distracted from it. I’ll make an effort to get a post together covering the actual benchmark (use, etc) this weekend though I won’t promise anything will be released until February.

      -John Havlik

Comments are closed.