June 2009

A good summary.

Someone famous is dead.

By the author of the Pictures for Sad Children webcomic.

Cross Posted from Club Troppo
Humour

Comments (0)

Permalink

And now for something completely geeky

Now that my exams are over, I’ve gotten back to working towards launching my little startup. During the second half of my semester I couldn’t find time or energy for it, but I did occasionally give it some thinking time, some of which has borne fruit.

Beware, Troppo readers; here be nerds.

My technical architecture has a few moving parts. The heart of the operation is to gather data via HTTP requests, perform some computations, send a reply, and keep a record of the transaction. Originally I planned to do this with the conventional fashion of webserver, PHP and a database. My first proof-of-concept system used this architecture. It worked well.

But as I thought about my goals more, it became clear that two key technical ‘non-functional’ objectives are to minimise response time and to maximise requests handled per second.

The first objective is about maximising user satisfaction. You may be familiar with how some websites load slowly because they pull in images, scripts, advertisements etc from a wide range of slow-responding servers. I am determined that my service will not be one of those slowpokes.

The second objective is about maximising profitability. The more users I can support per server, the fewer severs I will need. The fewer servers, the lower my monthly expenses. In general the first and second technical objectives will go hand-in-hand, but in case of conflict it is important to have a clear ordering.

Now the original architecture looked like this:

The original architecture

That’s a pretty standard approach. It works well, is proven and well-supported by existing tools. Some simple tests established that on a modest virtual machine I could expect to handled about 700 requests per second at an average of 300ms each.

The major drawback with this architecture is that performance isn’t set by the web server, it’s set by the database. Most web applications are read-oriented, with users who expect their web interface to up-to-date within seconds. But my system is actually write-oriented. In my situation there’s no firm requirement that the database be up-to-date in seconds. For all I care it could take hours for the data to pass from web server to database, just so long as it does.

So back to the drawing board. My second architecture modified the original by placing a queueing service between the web server and the database.

The second architecture considered

When the web server has data, it pushes it into the queue. The database can fetch the data at its own pace. Neither component is slowing down the other one.

The use of queues to allow the front-end servers to respond faster has become very common. Developers can choose from many commercial or off-the-shelf software systems for fast queueing. In addition Amazon even have “queueing as a service” with their Simple Queue System. With SQS you pay an extremely small fee per message; Amazon guarantees that the queue entries will persist for several days and allows unlimited queue entries.

But still I wasn’t satisfied with the front end. I wanted it to go faster, if at all possible. It so happens that I noticed another service offered to Amazon customers: Elastic Block Storage. Essentially Amazon allow you to have a virtual hard drive. You can attach it to virtual machines, read from and write to it, detach it, clone it, reattach it, or even shuffle it amongst different virtual machines.

That last part interested me most. With an architecture based on queueing, the bottleneck is still at the database server. It must continuously fetch items one at a time. This is quite fast, but nowhere near as fast as using database facilities for loading in bulk. Supposing I could ship thousands or even millions of datapoints to the database server at a time, my total performance goes up — few servers required, more profit.

This leads me to my third architecture:

The current architecture

In this design I am writing data directly to the virtual disk in the form of a simple log. Occasionally the disk is detached from the webserver and reattached to the database server. The database server performs a bulk load, then returns the virtual disk to the pool available for use by web servers.

This allows me two further optimisations.

The first is visible in the diagram. Instead of having my HTTP server talk to a PHP backend (or indeed any other backend), Lighttpd allows me to directly embed a Lua script using the mod_magnet plugin. Lua is a pleasant, lightweight language, and very fast. In some microbenchmarks I have seen it blow the doors off PHP for web serving performance.

I didn’t use embedded Lua in the original architecture due to a flaw in its database connection system, or in the second architecture because there are no Lua libraries for talking to Amazon’s SQS. In theory I could have rectified both of these drawbacks, but I am a great believer in simplifying my programming overhead whenever possible.

The second optimisation is to use a different file system. Most file systems are optimised to speed up seek or read times. Seek time is the measure of time it takes for the head of a hard drive to find the correct track to read data from; read time is a measure of how long it takes to pull a file off disk and into memory. File systems use a number of tricks and techniques to minimise these at the expense of write times. The basic thinking is that writing to disk happens less often than reading.

But my workload is the opposite: reading happens much less frequently. Write performance is my dominant consideration.

The latest Linux kernel includes a new file system from Japanese telco NTT called NILFS2. This is a “log-structured” file system. The basic upshot is that it is built around maximising write speed, which is exactly what I want. In a microbenchmark, NILFS2 improves overall performance by roughly 20% over Ext3, the default Linux file system.

Between the embedded Lua script and NILFS2, my average response time is down to 122ms with a standard deviation of 655ms. 95% of requests are served in less than 250ms. Overall the server can spit out 2064 requests/second on a simple microbenchmark. Remember that my original design could only handle 700 requests per second. That’s an approximate tripling of performance. If nothing else it shows that good design rarely turns up on the first attempt. It’s very possible that I will think of something much better later on (and even more so that you, Humble Reader, will do so).

And that ends your incredibly nerdy dose for the day.

Cross Posted from Club Troppo
Geeky Musings
IT and Internet
Startup

Comments (0)

Permalink

Quiet milestone

Yesterday was the 20th anniversary of Australia being connected to the internet.

On the night of the 23rd June 1989 Robert Elz of the University of Melbourne and Torben Neilsen of the University of Hawaii completed the connection work that bought the Internet to Australia. It was a 56kbps satellite circuit, and the Australian end used a Proteon P4100 router.

Since that day we’ve evidently connected some 56.8% of the population, or 12,073,852 Australians, to the Internet (according to user statistics published by the ITU-T)

I think thats a pretty impressive record, and worth noting!

The message was from Geoff Huston of the Australian Network Operators Group.

Cross Posted from Club Troppo
History
IT and Internet

Comments (0)

Permalink

Do you own NAB shares? Flog ‘em now.

NAB’s CEO has decided to work amongst his hoi-polloi in a corner cubicle.

Cubicles are one of worst false economies created by bean-counting. Managerial types seem to thrive on interruption. They love a crisis in which they can prove that they are as good as Captain Hornblower.

However it’s been known for a long time that cubicles are dreadful for productivity, particular in businesses where attention to abstract detail is important. Such as, I don’t know, banking.

Still, it could be worse. He could decide to introduce hot-desking.

Business
Cross Posted from Club Troppo

Comments (0)

Permalink

Our Nick gets the Nod 2.0

With all of today’s sturm und drang in Canberra, it perhaps slipped by Troppo readers that our own Nicholas Gruen has been tapped to head up a government taskforce on Government 2.0.

As I understand it, the taskforce is essentially being run out of AGIMO. Long time readers of Troppo will know that I’ve become a bit of an AGIMO fan over the years due to the good work they do. In this capacity Dave Bath deserves a hat-tip for championing their good work – and the work of the National Archives – on the fundamentals of good IT and information management.

The taskforce have set up a new blog. They’ve decided to walk the talk by setting up shop outside the traditional .gov.au box, with a blog hosted on Wordpress.com.

It has been interesting to see reactions so far. Even some of the angry responses have nuggets of useful insight. Hopefully some of the suggestions will be taken on board – a lot of them seem to be in line with Nicholas’s remarks about ‘engineering for serendipity’.

One of the nice things about Government 2.0 chatter is that it introduces a new pressure to get the fundamentals right. Citizen users should demand reliability, transparency, searchability, automation etc etc from the new interfaces and contact points. In turn this puts the onus on each and every department to get its IT and info-management house in order. Personally I think that the folks at AGIMO and the NAA have been prophets in the wilderness on this for some time, so hopefully the new taskforce will give these issues the attention and high-level champions they deserve.

In terms of what governments do today, the gathering, transformation, storage, querying, studying of information; as well as acting on its meaning, is the core of almost every process. Smart systems can reduce that overhead and make it possible to improve the tooth-to-tail ratio of public spending. Ultimately, as a taxpayer and as a libertarian, I want to be government to be as efficient and effective as possible. Modern IT, judiciously applied on a government-wide basis, is one means to that end.

I hope that this taskforce is more than window-dressing, but at least I know it’s in good hands. Congratulations, Nicholas. I look forward to hearing more.

Cross Posted from Club Troppo
Government 2.0
IT and Internet
Metablogging
Politics - national

Comments (0)

Permalink

I broke Troppo

Feel free to express your hatred, dread etc etc below.

Update: Just to add insult to injury, our server internet connection failed for about an hour.

Cross Posted from Club Troppo
Site News

Comments (0)

Permalink

An excuse to mention cryptography

I needn’t tell Troppo readers that a few days of heady excitement are afoot in Canberra. Personally I doubt that the PM will go, but Swan might be in serious trouble.

A lot of argy-bargy has gone on about whether a “smoking gun” email, allegedly in the Coalition’s possession, is genuine. A lot of people have pointed out that it’s pretty easy to forge an email — after all, it’s just a bunch of text.

Enter the noble and mysterious art of cryptography. In particular, enter the mature but rarely-used technology of digital signatures.

Some Technical Talk

Digital signature schemes combine the two great pillars of modern cryptography: one-way hash functions and dual-key encryption.

A hash function turns any document into a fixed-length string. The clever part about hash functions is that they aren’t reversible. You can’t take the generated hash and run it backwards to get the original document. Hashes are widely used to store passwords or to check the integrity of downloads.

Dual-key encryption means that every person gets a private key, and they distribute a public key. If you possess the private key you can read items encrypted with the public key, which ensures that only the intended reader will see it. Alternatively, if you encrypt an item with your private key, people can ensure that it was you who did it by applying your public key.

In a digital signature scheme, these two things are combined. First, the system generates a hash for the document being signed. Then, it encrypts that hash with the sender’s private key.

At the other end, the receiver decrypts the signature with the public key, then computes the document hash themselves. If the received hash is the same as their own calculation, they know that the document was sent by the person claimed.

Some Practical Consequences

Digital signatures (and a related technology, HMACs, which I won’t discuss here) can quickly settle the question of who said what and when. If the sender’s private key is secure, the digital signature cannot be forged, cannot be reneged and cannot be refuted. It is, in actual fact, more reliable than a paper signature.

If the Federal government used digital signatures, it would be trivial to establish whether the Opposition possessed the genuine article. HMACs would further make it possible to determine, given a collection of correspondence, whether any items are missing from that collection.

Support for digital signatures is built into every modern email client and server. I reckon that the government could do worse than to roll out digital signature infrastructure throughout government. For centuries we’ve been relied on the integrity of the paper trail; it would be nice if the e-trail was as trustworthy.

Cross Posted from Club Troppo
IT and Internet
Politics - national

Comments (0)

Permalink

Followup to my mortgage procedures survey

As I promised a few weeks ago, I have a brief report outlining the results from the survey I conducted comparing two different mortgage calculation procedures. For the truly curious, here it is.

Blegs
Cross Posted from Club Troppo
IT and Internet

Comments (0)

Permalink

What’s Killing The Newspaper? It Isn’t Bloggers.

In the last few months, the discussion of the future of newspapers has become a recurring topic in the media and online. Several common themes and arguments have emerged. The most common gripes are either that newspapers are being killed by bloggers, or that newspapers are being killed by failing to get their own news, relying on wire services instead.

The truth has little to do with quality, reporting or bloggers. It’s all about money.

You see, a newspaper has three sources of income:

  1. Circulation: this is the cover price you pay, or the subscription you bought.
  2. Advertising: these are the big, splashy boxes in the body of the newspaper.
  3. Classifieds: these are the tiny, densely packed text ads at the back of the newspaper.

If you ask most people how a newspaper makes its money, most would tell you circulation, many would tell you advertising, and some would mention classifieds.

But the order is actually backwards. In most papers, classifieds are the biggest earner, followed by advertising, followed by circulation. In fact, for many papers, the cover price doesn’t fully cover printing and distribution. All the journalistic institutions of the 20th century were subsidised not by readers, but by the “rivers of gold” — the regular flow of classified ads, lodged week in, week out with nary an interruption.

But the rivers of gold are drying up. In the USA, the free classifieds website Craigslist is busily sucking the money out of the local markets newspapers have traditionally relied on. And although the big newspapers and conglomerates have online versions of classifieds, it’s much harder to enjoy the kind of exclusivity they used to get in most towns. To start a daily newspaper takes years and costs millions of dollars, and is very risky into the bargain. To start a website costs perhaps $20 and a bit of time installing software. The barriers to competition are very low.

Advertising is losing its punch too. Newspapers have tried to import the “display advertising” model into the online space, with limited success. Again, the problem is that anyone can set up a website and sell advertising space. This space — called “inventory” by the industry — is expanding extremely rapidly. It has expanded more rapidly than the number of people online. At the same time, demand for all forms of advertising is slumping. The iron laws of supply and demand are driving down the money that can be earned from online advertising, and it simply cannot replace the profitability of print advertising.

In some ways, circulation is unimportant. A newspaper that isn’t printed is a smaller loss to make up. But of course advertisers and classifieds customers rely on circulation to get their value for money. This is one area where free alternatives, like bloggers, does affect the long term shape of the industry — by gutting circulation, it makes newspapers less attractive than free or cheaper online alternatives.

It’s all about money, folks. The newspaper business has had more than a century of stable income. That period is suddenly coming to an end. The invisible hand is slapping the newspaper business, and slapping it hard.

Where to from here? Some newspapers are reportedly planning to simultaneously introduce ‘paywalls’ to their content. Online distribution of content is expensive, but nowhere near as expensive as physically printing and distributing newspapers — you might say that the haulage on photons and electrons is cheaper than the haulage on atoms.

So, the reasoning goes, if you charge a low price for access to the content, then circulation could take up the slack because it would be profitable in itself.

But they’ve already sussed out the problem with this model: you need to form a cartel for it to work. Putting aside the legal niceties of antitrust and competition laws, there’s the plain economics of the matter. If any one reputable newspaper or group breaks ranks, they will clean up the “eyeballs” and so be able to earn more. Cartels do sometimes succeed, but usually don’t because of the incentive to cheat. And with everyone in a panicky mood, how long would it take for somebody to break ranks?

So there you have it: a quick summary of why newspapers are withering.

Disclosure: Before moving to Perth, I was employed as a classifieds salesperson. I am also working on a startup which I hope will upend this dreary economic situation. I have a truly marvelous scheme which this margin is too narrow to describe (and may be looking for investors and directors soon).

Business
Cross Posted from Club Troppo
IT and Internet
Journalism
Media
Metablogging
Print media

Comments (0)

Permalink

Any AFL heavies in the audience?

Brother John Pye was a well-known, widely respected Catholic missionary who died last week, aged 102.

Brother Pye spent 16 years on the Tiwi islands in the NT. He brought Catholicism to the islanders with what was perhaps modest success. The other religion he introduced — Australian rules football — was much, much more successful.

Ever since then the Tiwi islands have been sending generation after generation of footy legends to play all over the country.

If anyone deserves a special mention in the AFL Hall of Fame, it’s Brother Pye. Anyone able to put in a word with the headquarters?

Cross Posted from Club Troppo
History
Sport-general

Comments (0)

Permalink