Caching is Magic

This blog is hosted on a small Linode machine that does many things; I was interested in speeding things up without improving the hardware.

For a blog, caching makes a lot of sense, since posts usually don't change much (or at all) with time; only aggregate pages change (with every new post).

In the same way, if you manage a website that serves content that's not updated frequently (case in point: product pages on an eCommerce site), implementing caching should be the first thing you try, before worrying about machines, load balancing, database bottleneck, etc.

I first tested the existing config. with ApacheBench on 5000 requests and a concurrency of 200; that failed after the 1000th request:

# ab -n 5000 -c 200 http://blogs.medusis.com/
This is ApacheBench, Version 2.3 
Benchmarking blog.medusis.com (be patient)
Completed 500 requests
Completed 1000 requests
apr_socket_recv: Connection timed out (110)
Total of 1022 requests completed

Then I implemented disk-based caching thusly (Apache2 on Debian):

  • turned on mod_disk_cache (mod_cache is automatically turned on as a dependency of mod_disk_cache): a2enmod mod_disk_cache
  • set up caching in the corresponding VirtualHost (CacheEnable disk /)
  • restarted Apache /etc/init.d/apache2 restart

And that's it! Cache is on! The results are good -- here's the output from a new test with ApacheBench (edited for length):

# ab -n 5000 -c 200 http://blogs.medusis.com/
This is ApacheBench, Version 2.3 

Server Software:        Apache/2.2.16
Server Hostname:      blog.medusis.com
Server Port:               80

Document Path:          /
Document Length:        9689 bytes

Concurrency Level:      200
Time taken for tests:   15.633 seconds
Complete requests:      5000
Failed requests:        0
Write errors:           0
Total transferred:      49997231 bytes
HTML transferred:       48755613 bytes
Requests per second:    319.83 [#/sec] (mean)
Time per request:       625.327 [ms] (mean)
Time per request:       3.127 [ms] (mean, across all concurrent requests)
Transfer rate:          3123.19 [Kbytes/sec] received

Percentage of the requests served within a certain time (ms)
 50%    291
 66%    299
 75%    301
 80%    303
 90%    341
 95%   1144
 98%   7375
 99%   9234
100%  14428 (longest request)

These numbers are not extraordinary but they are honorable:

  • 90% of requests are served under 400 ms (less than half a second)
  • no request fails
  • the mean number of requests per second is over 300 (wich is a respectable amount of traffic -- in 2008, Twitter used to get a mean of 300 requests per second).

There are many other things to caching; for instance, for disk-based caching you need a way to periodically clear the cache with htcacheclean. For more on this, Google and man pages are your friends.

But the point is: caching is the closest thing to a free lunch!

Mon, 15 Apr 2013 • permalink

Why Cut a Hard Drive

I just started a video series where I cut things with an angle grinder. Here's the first episode where the victim is a 5-1/2 hard drive:

Let's Cut It

Some people want to know why one would cut a hard drive in half -- or anything else for that matter.

Erasing a hard drive

Cutting a hard drive is the surest way to make the data it contains unreadable.

When you change hard drives, you may be tempted to throw the old one in the bin; but if you do this you expose yourself to anyone picking your old hard drive from the bin and extracting all its contents (passwords, emails, etc.)

Simply "erasing" the contents, or even "formatting" the drive isn't enough either; those operations usually only rewrite the file system but not the data, which can be recovered with widely available "unerase" software.

There are utilities to permanently destroy data on your hard drive; they work, but take a long time and are clumsy -- and besides, who would use software where an angle grinder does the same thing?!?

Be in charge

But there are many other things to cut besides hard drives! Next on the list are phones, cameras, Nespresso coffee machines, etc.

Why would I want to cut all those things? Besides the fact that it's so fun, there's a philosophical point to it.

The iPhone comes with special screws that require a dedicated screw driver.

Most Apple products don't even have visible screws. They come in beautiful boxes as if they were jewels, and the message is that they're so perfect they should only be admired and caressed and nothing more. (Apple is not the only culprit of course, just the most prominent one).

The point of appliances should be to help us do things better / faster, not be part of a cult. They're here to serve us, not the other way around.

It's certainly more useful to take things apart than to destroy them; but there's a childish joy in cutting them open; it's a post-modern act of freedom.

Let's void this warranty.

Tue, 09 Apr 2013 • permalink

CSS2XML

On XSL list someone asked how to serialize a CSS to XML. There appears to be no ready-made solution, so I gave it a shot.

There are a couple of projects that try to parse CSS from scratch -- for instance, a node.js library that parses CSS and outputs JSON; an XSL project that parses CSS in XSLT and reinjects the results in a custom-built XSL stylesheet for further processing...

The problem with these initiatives is that they need to be actively maintained to stay current; most aren't.

CSS parsing is difficult; rather than trying to come up with your own parser, it's much much better to use an existing specialized library. CSSutils is such a library, available for Python 2.x (>=2.5) and 3.x; it's pretty complete (it even parses comments!) and active.

10 lines of Python

Using this library, it's pretty easy to produce an XML version of a given CSS; here's the Python code:

import cssutils, gnosis.xml.pickle

css = {}
sheet = cssutils.parseFile("yourcssfile.css")

for rule in sheet:
  if rule.type == rule.STYLE_RULE:
    css[rule.selectorText] = {}
    for property in rule.style:
      css[rule.selectorText][property.name] = property.value

print gnosis.xml.pickle.dumps(css)

(We create an object for each rule and property, and then serialize this object to XML. Or, to serialize to JSON instead, import json and change the last line to 'print json.dumps(css)').

Here, the CSS is taken from a local file; but cssutils is able to parse either a string, or even fetch a "live" CSS (cssutils.parseUrl deals with urllib2 directly).

Many things could be improved from this simple example, but that should get one started.

And if anyone is interested in using [a more elaborate version of] this as a service, please get in touch!

Mon, 01 Apr 2013 • permalink