And now for something completely geeky

Now that my exams are over, I’ve gotten back to working towards launching my little startup. During the second half of my semester I couldn’t find time or energy for it, but I did occasionally give it some thinking time, some of which has borne fruit.

Beware, Troppo readers; here be nerds.

My technical architecture has a few moving parts. The heart of the operation is to gather data via HTTP requests, perform some computations, send a reply, and keep a record of the transaction. Originally I planned to do this with the conventional fashion of webserver, PHP and a database. My first proof-of-concept system used this architecture. It worked well.

But as I thought about my goals more, it became clear that two key technical ‘non-functional’ objectives are to minimise response time and to maximise requests handled per second.

The first objective is about maximising user satisfaction. You may be familiar with how some websites load slowly because they pull in images, scripts, advertisements etc from a wide range of slow-responding servers. I am determined that my service will not be one of those slowpokes.

The second objective is about maximising profitability. The more users I can support per server, the fewer severs I will need. The fewer servers, the lower my monthly expenses. In general the first and second technical objectives will go hand-in-hand, but in case of conflict it is important to have a clear ordering.

Now the original architecture looked like this:

The original architecture

That’s a pretty standard approach. It works well, is proven and well-supported by existing tools. Some simple tests established that on a modest virtual machine I could expect to handled about 700 requests per second at an average of 300ms each.

The major drawback with this architecture is that performance isn’t set by the web server, it’s set by the database. Most web applications are read-oriented, with users who expect their web interface to up-to-date within seconds. But my system is actually write-oriented. In my situation there’s no firm requirement that the database be up-to-date in seconds. For all I care it could take hours for the data to pass from web server to database, just so long as it does.

So back to the drawing board. My second architecture modified the original by placing a queueing service between the web server and the database.

The second architecture considered

When the web server has data, it pushes it into the queue. The database can fetch the data at its own pace. Neither component is slowing down the other one.

The use of queues to allow the front-end servers to respond faster has become very common. Developers can choose from many commercial or off-the-shelf software systems for fast queueing. In addition Amazon even have “queueing as a service” with their Simple Queue System. With SQS you pay an extremely small fee per message; Amazon guarantees that the queue entries will persist for several days and allows unlimited queue entries.

But still I wasn’t satisfied with the front end. I wanted it to go faster, if at all possible. It so happens that I noticed another service offered to Amazon customers: Elastic Block Storage. Essentially Amazon allow you to have a virtual hard drive. You can attach it to virtual machines, read from and write to it, detach it, clone it, reattach it, or even shuffle it amongst different virtual machines.

That last part interested me most. With an architecture based on queueing, the bottleneck is still at the database server. It must continuously fetch items one at a time. This is quite fast, but nowhere near as fast as using database facilities for loading in bulk. Supposing I could ship thousands or even millions of datapoints to the database server at a time, my total performance goes up — few servers required, more profit.

This leads me to my third architecture:

The current architecture

In this design I am writing data directly to the virtual disk in the form of a simple log. Occasionally the disk is detached from the webserver and reattached to the database server. The database server performs a bulk load, then returns the virtual disk to the pool available for use by web servers.

This allows me two further optimisations.

The first is visible in the diagram. Instead of having my HTTP server talk to a PHP backend (or indeed any other backend), Lighttpd allows me to directly embed a Lua script using the mod_magnet plugin. Lua is a pleasant, lightweight language, and very fast. In some microbenchmarks I have seen it blow the doors off PHP for web serving performance.

I didn’t use embedded Lua in the original architecture due to a flaw in its database connection system, or in the second architecture because there are no Lua libraries for talking to Amazon’s SQS. In theory I could have rectified both of these drawbacks, but I am a great believer in simplifying my programming overhead whenever possible.

The second optimisation is to use a different file system. Most file systems are optimised to speed up seek or read times. Seek time is the measure of time it takes for the head of a hard drive to find the correct track to read data from; read time is a measure of how long it takes to pull a file off disk and into memory. File systems use a number of tricks and techniques to minimise these at the expense of write times. The basic thinking is that writing to disk happens less often than reading.

But my workload is the opposite: reading happens much less frequently. Write performance is my dominant consideration.

The latest Linux kernel includes a new file system from Japanese telco NTT called NILFS2. This is a “log-structured” file system. The basic upshot is that it is built around maximising write speed, which is exactly what I want. In a microbenchmark, NILFS2 improves overall performance by roughly 20% over Ext3, the default Linux file system.

Between the embedded Lua script and NILFS2, my average response time is down to 122ms with a standard deviation of 655ms. 95% of requests are served in less than 250ms. Overall the server can spit out 2064 requests/second on a simple microbenchmark. Remember that my original design could only handle 700 requests per second. That’s an approximate tripling of performance. If nothing else it shows that good design rarely turns up on the first attempt. It’s very possible that I will think of something much better later on (and even more so that you, Humble Reader, will do so).

And that ends your incredibly nerdy dose for the day.

This entry was posted in Cross Posted from Club Troppo, Geeky Musings, IT and Internet, Robojar. Bookmark the permalink.