Caching is Magic

This blog is hosted on a small Linode machine that does many things; I was interested in speeding things up without improving the hardware.

For a blog, caching makes a lot of sense, since posts usually don't change much (or at all) with time; only aggregate pages change (with every new post).

In the same way, if you manage a website that serves content that's not updated frequently (case in point: product pages on an eCommerce site), implementing caching should be the first thing you try, before worrying about machines, load balancing, database bottleneck, etc.

I first tested the existing config. with ApacheBench on 5000 requests and a concurrency of 200; that failed after the 1000th request:

# ab -n 5000 -c 200 http://blogs.medusis.com/
This is ApacheBench, Version 2.3 
Benchmarking blog.medusis.com (be patient)
Completed 500 requests
Completed 1000 requests
apr_socket_recv: Connection timed out (110)
Total of 1022 requests completed

Then I implemented disk-based caching thusly (Apache2 on Debian):

turned on mod_disk_cache (mod_cache is automatically turned on as a dependency of mod_disk_cache): a2enmod mod_disk_cache
set up caching in the corresponding VirtualHost (CacheEnable disk /)
restarted Apache /etc/init.d/apache2 restart

And that's it! Cache is on! The results are good -- here's the output from a new test with ApacheBench (edited for length):

# ab -n 5000 -c 200 http://blogs.medusis.com/
This is ApacheBench, Version 2.3 

Server Software:        Apache/2.2.16
Server Hostname:      blog.medusis.com
Server Port:               80

Document Path:          /
Document Length:        9689 bytes

Concurrency Level:      200
Time taken for tests:   15.633 seconds
Complete requests:      5000
Failed requests:        0
Write errors:           0
Total transferred:      49997231 bytes
HTML transferred:       48755613 bytes
Requests per second:    319.83 [#/sec] (mean)
Time per request:       625.327 [ms] (mean)
Time per request:       3.127 [ms] (mean, across all concurrent requests)
Transfer rate:          3123.19 [Kbytes/sec] received

Percentage of the requests served within a certain time (ms)
 50%    291
 66%    299
 75%    301
 80%    303
 90%    341
 95%   1144
 98%   7375
 99%   9234
100%  14428 (longest request)

These numbers are not extraordinary but they are honorable:

90% of requests are served under 400 ms (less than half a second)
no request fails
the mean number of requests per second is over 300 (wich is a respectable amount of traffic -- in 2008, Twitter used to get a mean of 300 requests per second).

There are many other things to caching; for instance, for disk-based caching you need a way to periodically clear the cache with htcacheclean. For more on this, Google and man pages are your friends.

But the point is: caching is the closest thing to a free lunch!