Everything that humans touch eventually becomes complex, whether we like it or not.
(Blog posts too. This one started out as a comparison of three competing software system alternatives and subsequently bloated into a discussion of chaos in computer systems.)
The problem is that being humans, we design systems. Design is less about the desired purpose of a system and much more about the limits of the designers. We’re trying to optimise in two dimensions: the problem space itself, and then according to the limits of human cognition. Within the problem space we can make incremental progress along an ostensibly continuous axis. In terms of human understanding … well that gets tapped out much more quickly.
That’s why I say to people that computer science is about the limits of computation; software engineering is about the limits of the engineers.
What an engineer wants is a linear system. Something where all the relationships are mapped out. Where any scenario can be foreseen and tested with pen and paper. But the world is filled with problems that aren’t linear systems. They have loops, unknown inputs, sudden catastrophic tipping points, surprising sensitivity to initial conditions and so forth.
It gets worse: some of these systems are completely artificial. Take computers. The basis of computer science, and by extension all actual computing systems of any kind, is the Turing Machine. It’s a tremendously simple device: it’s a tape marked with symbols. The tape can be advanced or rewound, and there is a head with a pen and an eraser that can mark or clear points on the tape. An example of such a machine, hewing to Turing’s hypothetical design, would look like this:
You establish a few rules about what to do when a certain symbol is under the head — and voila, all the things that can be computed are now computable. Indeed, any Turing Machine can simulate any other Turing Machine. If you’re interested, the book to look at is Charles Petzold’s fascinating The Annotated Turing.
Deep down, modern CPUs still resemble Turing Machines, if you squint a bit. Here’s some very low-level code for adding 1 to 2:
li $t1, 1 add $t0, $t1, 2
The first line says “load the number 1 into location t1”. The second line says “add 2 to the value in location t1. Store the result in location t0”.
You can see how this could be visualised in tape-and-head terms. Symbols or strings of symbols can encode the instructions, the addresses and the values being computed. Low-level code of this kind (called assembler) mostly concerns itself from shuffling values from memory into the chip, performing a simple operation, then shuffling it to another spot. Memory for a chip looks like a lot like Turing’s tape.
Deterministic but Complex … even Chaotic
So computers are deterministic. At each step, you can always predict what the next step will be. And often you can see a few steps ahead (but there are certain things you can’t know in advance — read the Petzold book for why).
Yet computers, at the level that we actually use them, are wildly unpredictable. Things regularly go kerflooie. What’s worse: we don’t know why. We have a computer system that is faulting, but we don’t know what defect or complex of interacting defects has caused it. The process of diagnosis is usually called “debugging” and it is a storied art in its own right.
The root approach to debugging is basically a degenerate scientific method. Form a gut suspicion, change something, observe the results, repeat. And indeed some of the literature on debugging calls for this approach to be consciously practiced. And it’s a useful abstraction to borrow, because it adds a few little accoutrements to the standard frantic intuitive debugging that occurs.
The better-dressed process for debugging has two parts. First you must be able to recreate the faulty behaviour; second you must diagnose the root defect(s). I’ll worry about diagnosis of defects another time.
So first you take the bug report and try to see if you can recreate the defect. And this is usually where debugging falls flat on its face. Computer systems are, it seems on the face of it, chaotic: small differences in starting conditions can blow up into stunning differences in actual performance. The developer’s workstation environment is sufficiently different from the production environment that some common factor is missing. (This is why web developers always start by asking “which browser are you using?” in the faint hope that they can blame it on Internet Explorer and wash their hands of the matter).
So one source of misbehaviour in complex computer systems is divergent configurations. Different operating systems, slightly different versions of the same operating system, slightly misaligned system clocks, differently configured services, different network interfaces and on and on ad infinitum. And human intervention can make things worse, not better. System administrators log into a misbehaving server, notice that a particular setting is wrong and fix it manually. Before they go on to fix the secondary server the phone rings and they’re distracted. This well-meaning system administrator has just made things worse. The level of entropy has increased. The system landscape has just become more fiddly and has more hidden gullies of failure. The odds of failure have increased.
Luckily, computers are not humans. They will put up with any amount of regimentation and strictness without complaint. And unlike human laws, where compliance can never be complete, computers will always faithfully obey commands. Even the defective ones.
Let’s play make-believe for a minute. Playing make-believe is uniquely important to software development because — in general, modulo mathematical impossibilities — if we can imagine a computer doing something, we can eventually make the computer do that something.
Imagine that we have a reference model of how the system should be configured. Periodically some software inspects the real system and compares it to the ideal model. If there’s a difference found, that software takes necessary actions to bring the real system into alignment with the model we established earlier.
The pair of you who read this blog might recognise what we have here. Yes: it’s another control system.
Luckily for me, it won’t be necessary to develop such a control system myself (and thereby pick up something useless like a Masters or PhD en route). This genre already exists, and there are three main pieces of software in it: CFEngine, Puppet and Chef.
CFEngine, Puppet and Chef compared
The first tool I looked at was Chef. Of the three, Chef is the youngest. Being a groovy, funky, channel 27 sort of fellow, I decided this meant that It Was The Best.
I persisted with Chef for about 6 or 7 weeks, including a long diversion into trying to write a new package provider for it. I threw in the towel for three reasons. Most important of these was the realisation that my vision of such a system, and Chef’s vision, were actually very different. Chef positions itself as an alternative for Puppet and CFEngine, but it’s not, really. It’s a remote execution engine with a great deal of architectural overhead.
Part of why it took me so long to realise this is that Chef falls prey to the Ruby community propensity to utterly unnecessary puns and wordplay. I’m now embarking on my fourth decade and the appeal of wrapping concepts in clever but non-descriptive names has well and truly worn the hell off. Chef takes the whole metaphor to the nth degree — recipes, knives and so on — until mysteriously it doesn’t (Ohai). This did not help me in coming to grips with how the thing actually worked.
And that’s the second problem: you need to know too much about how Chef works in order to get work done. Chef is less a tool and more a framework; and it’s not until you’ve spent weeks head-butting the documentation that it becomes apparent that this framework is about abstractly listing commands for remote execution.
If only I’d paid more attention to the surrounding literature. Chef’s documentation repeatedly mentions the concept of idempotency. The point is that Chef only seems to get as far as idempotency and no further. If you scripts are idempotent, goes the Chef reasoning, running them multiple times will always cause your system to converge to the correct state.
The problem is that this pushes all responsibility for ensuring idempotency back onto me. And I don’t bloody want it. I would rather have a detect-repair mechanism embedded in the tool than have to recreate that logic myself. So far as I can tell, Chef supports this only partially.
Next I looked at CFEngine. This tool has been around in various forms since the early 1990s. Indeed there’s a certain amount of theory that underlies the tools. But I still found that CFEngine wasn’t for me, for two reasons.
First, the specification language is unnecessarily poor. There’s too much unnecessary syntax and fiddliness that could have been done away with. In some places the chosen nomenclature is confusing. Why are “classes” not called “conditions”, for example? Especially since the term “class” means something entirely different in other languages — and the term “conditions” neatly describes what CFEngine means by “class”.
My main issue with CFEngine is that it does not allow for dependencies between elements of the model to be enforced. Instead CFEngine relies on multiple applications of policies to “gradually converge” to the desired state. In the book Learning CFEngine, an enormous amount of jiggery-pokery is devoted to recreating the concept of dependencies. “Baby”, goes the saying, “I don’t got time”.
Puppet is the middle child. Puppet inherits some of its thinking from CFEngine but, blessedly, directly allows modelling of dependencies between different parts of the model. This turns the configuration model from a being either a collection of idempotent scripts (Chef) or a thin gas of atomic configuration items (CFEngine) into being something useful: a directed, acyclic graph of configuration items.
What Puppet appears to get right, in contrast to the other two, is that I want both the declarative, detect-and-repair model for individual configuration items and that the relationship between those items matters.
In this respect Puppet better models how the real world of systems works. It also fits better into models such as ITIL, which are grounded in dependency models of configuration items and services.
And so it is that I will be using Puppet from hereon out to configure my systems and to continuously reign in creeping entropy on them. One more pocket of the universe saved from the greedy clutches of chaos.