Comparing Benchmark Tools
As I noted last week, I have moved my framework benchmarking project to GitHub. As part of the move, I updated the project to allow benchmarking using any of three tools: Acme http_load, Apache ab, or Joedog siege. (For reference, the old project will remain at GoogleCode.)
I thought it might be interesting to see what each of them reports for the baseline “index.html” and “index.php” cases on the new Amazon EC2 setup (using a 64-bit OS on an m1.large
instance). The results follow (all are at 10 concurrent users, averaged over 5 one-minute runs):
ab | rel | avg |
------------------------ | -------- | -------- |
baseline-html | 1.2660 | 3581.54 |
baseline-php | 1.0000 | 2829.11 |
http_load | rel | avg |
------------------------ | -------- | -------- |
baseline-html | 1.2718 | 4036.24 |
baseline-php | 1.0000 | 3173.56 |
siege | rel | avg |
------------------------ | -------- | -------- |
baseline-html | 1.2139 | 5060.25 |
baseline-php | 1.0000 | 4168.76 |
They all show very different “absolute” numbers of requests/second: ab
thinks the server delivers about 3600 req/sec, http_load
reports about 4000, and siege
says about 5000.
Note that the ab
and http_load
relative scores are in line with each other, reporting about a 26-27% slowdown for invoking PHP. Siege thinks PHP is more responsive than that, with only a 21% slowdown.
Which of these is the most accurate? I don’t know. I ran the benchmarking tool on the same server as was being benchmarked, so the differences may result from how much processing power was being consumed by the benchmarking tools themselves.
One interesting point is that ab
no longer appears to be over-reporting the baseline cases, as I noted in an earlier benchmark posting. There are two major changes between then and now: (1) the updated project uses Ubuntu 10.10 instead of 8.10, which means the packaged ab
binary might have been flawed earlier, or that the new OS otherwise corrects some other issue; (2) the updated project uses an m1.large
64-bit instance instead of an m1.small
32-bit instance. Either of those differences might be sufficient to account for the disparity in ab
reporting previously.