iWork Numbers 2008 can’t deal with time durations

A very annoying discovery.

One of the common complaints about the Personal Software Process is that there’s a lot of data-entry and number crunching. The data entry I can’t do too much about (though compare Hackystat), but number crunching is something that computers are supposedly good at.

So this afternoon I spent a few hours running up a simple spreadsheet for the first exercise of ADFSE. The first exercises require students to use the “PSP0″, or Personal Software Process level 0. In PSP0, there are three documents/forms: the time log, the defect log and the project summary.

There are other examples of these forms about on the internet for various tools (mostly Word and Excel), but I wasn’t really satisfied with any of them. In particular, most of them a) slavishly recreated the appearance of the forms in ADFSE and b) don’t do any calculation for you.

So I decided to create my own spreadsheet in Numbers. When it’s a bit more battle-tested I might upload it for others to use.

Naturally it has the benefits of spreadsheets (easy calculation of related variables) and the drawbacks (non-robust data storage, needing a file per-project). But most annoying to me is that Numbers 2008 doesn’t have a notion of a time interval. The PSP time log format is to enter a start time and an end time, then to subtract to determine the number of minutes. Numbers can’t perform this calculation because it has no duration type to express the result of that subtraction into. Apparently Numbers 2009 has something like this, but I don’t feel like spending money on a perhaps.

I might be stuck redoing the spreadsheet in openoffice. Damn yak-shaving.

Posted in PSP, Rants | Leave a comment

An interview with Watts Humphreys

Grady Booch spent 18 hours talking to Watts Humphreys about his life and life’s works. Definitely worth a read — set aside a few hours.

Posted in History, Life, PSP | Leave a comment

Discipline

I’ve decided to work through the exercises in Watts Humphrey’s A Discipline for Software Engineering, to see whether they’ll help me to be a better developer. With any luck I’ll remember to update the blog with my findings.

A Discipline for Software Engineering (let’s called it ADFSE) outlines the “Personal Software Process” or PSP. Essentially Humphrey took what was “best practice” for large software projects in 1995 and boiled it down to a process for single programmers. The book outlines a mix of practices and principles; and provides exercises to be carried out.

I first saw mention of the ADFSE in two different books by Steve McConnell, that incomparable genius of concise summary and approachable anecdotes. He mentions it both in Code Complete, 2nd Edition and in Software Estimation: Demystifying the Black Art. I figured that this was as good an endorsement as any.

In some ways the key innovation of ADFSE is to turn software process inside out. Normally software process is seen something that is imposed in a top-down from outside the developer. It is placed between developers and other developers, and between developers and their work. A lot of developers chafe at process for various reasons. For one, it can be a token of a local management-by-fiat culture. It’s also often very boring to write endless reports — coding is much more fun.

ADFSE instead works from the developer out. The developer imposes the process on their own self. This turns it from a matter of fiat and punishment into one of self-discipline — hence the book title.

I had been putting off lashing myself to the pole on this one because there’s few things more embarrassing than making a loud public noise and then not backing it up. But my decision has been given new impetus by the death of Watts Humphrey in the past few days. A remarkable pioneer of the computing world, he was a titan who served full careers in both industry and academia. He’s a great loss.

Posted in Books, PSP | Leave a comment

My new favourite website

Beautiful galleries of futurist art at Moltee.

Posted in Art | 1 Comment

Not Dead

Just resting.

I have internet access for an hour or two, so I thought I’d let the teeming masses know that I’m still alive.

Posted in Diary | 2 Comments

An XCache ‘gotcha’

XCache is a PHP opcode cacher. This is good and well as it speeds up pageloads by removing PHP parsing overhead. On the downside, you must remember to restart your PHP processes after changing PHP files, as XCache doesn’t seem to perform stat()s on the files to check if they’ve been updated.

Posted in Technical Notes | Leave a comment

Success

Today’s WWA Grand Prix #3 was a good day for yours truly. I won the gold for most-improved Sinclair and took second place for the premier cup. Snatch 90kg, Clean & Jerk 130kg, total 220kg. Just as Jack Walls predicted.

Update: Turns out I am the bronze medallist. Geish Hori is the silver medallist. A well-deserved win. Well done Geish!

Posted in Diary, Weightlifting | Leave a comment

Brian’s Latest Comments doesn’t scale.

Or, to be fair, MySQL 3.x is rubbish. But you knew that already.

Most of the blogs on the Ozblogistan network use BLC. Recently, Larvatus Prodeo came on board and expected it to work for them too. But it simply hung without giving much in the way of error message. I pottered around with it at the time without much success.

Today it occurred to me that the problem is the way BLC works. BLC aims at being compatible with MySQL 3.x. Among other things, this means giving up subqueries and having piss-poor SELECT DISTINCT.

So BLC issues this beauty of a query:

SELECT comment_post_ID, post_title
FROM (wp_comments LEFT JOIN wp_posts ON (comment_post_ID = ID))
WHERE comment_approved = '1'
AND wp_posts.post_status='publish'
AND comment_type'pingback'
AND comment_type'trackback'
ORDER BY comment_date DESC;

This query pulls out the post ID and post_title for every approved, published comment in the database. On a site like Larvatus Prodeo, the results run to ~165k lines. BLC pulls these results into PHP as objects, one for each line; then it merges down the results to determine which posts have comments.

As you can imagine, this is memory intensive. Very memory intensive. And it’s a bit brutal on the CPU too, especially when garbage collection occurs. The upshot is that on Larvatus Prodeo, BLC exceeds the memory limits for PHP, and that for other sites, it adds about between 600 and 1200 milliseconds of processing time. Not cool.

Naturally, I’m running MySQL 5.x on my server, so SELECT DISTINCT works ‘as advertised’. So I’ve implemented a super high tech optimisation on BLC:


SELECT DISTINCT comment_post_ID, post_title
FROM (wp_comments LEFT JOIN wp_posts ON (comment_post_ID = ID))
WHERE comment_approved = '1'
AND wp_posts.post_status='publish'
AND comment_type'pingback'
AND comment_type'trackback'
ORDER BY comment_date DESC;

While this query is slower than the original, it basically returns only the list of posts with comments on them. On Larvatus Prodeo, this is a much more manageable ~2.5k lines of results. The reduction in PHP runtime completely overshadows the increase in query time; and besides, the smaller result sets don’t clog up the MySQL query cache.

So there you have it: a one-word way to dramatically improve the performance of Brian’s Latest Comments on MySQL 5.x-backed Wordpress.

Update: Two additional optimisations are worthy. First, add a LIMIT clause to your copy of BLC along the lines of:

$posts = $wpdb->get_results("SELECT DISTINCT
comment_post_ID, post_title
FROM ($wpdb->comments LEFT JOIN $wpdb->posts ON (comment_post_ID = ID))
WHERE comment_approved = '1'
AND $wpdb->posts.post_status='publish'
$ping
ORDER BY comment_date DESC
LIMIT $num_posts;");

Second, the SELECT DISTINCT hammers the tables much harder, so add indexes for the relevant fields:


CREATE INDEX post_status_index ON wp_posts(post_status);
CREATE INDEX comment_date_index ON wp_comments(comment_date);
CREATE INDEX comment_type_index ON wp_comments(comment_type);
CREATE INDEX comment_date_approved_index ON wp_comments(comment_date_gmt);
CREATE INDEX post_title_index ON wp_posts(post_title(50));

Update 2: The query above doesn’t order the results according to the time of the latest comments; it winds up ordering by posting date. Not what my users want. Try this instead:

$posts = $wpdb->get_results("SELECT comment_post_ID, post_title, max(comment_date) AS max_date
FROM ($wpdb->comments LEFT JOIN $wpdb->posts ON (comment_post_ID = ID))
WHERE comment_approved = '1'
AND $wpdb->posts.post_status='publish'
$ping
GROUP BY post_title
ORDER BY max_date DESC
LIMIT $num_posts;");

Update 3: Nope. Performance is still atrocious. I’ll revisit this in a few weeks when I’ve settled in Darwin.

Posted in Technical Notes | 13 Comments

More tweaking for Ozblogistan

I did some minor optimisations on Ozblogistan last weekend.

I have HTTP compression enabled on the webserver (gzip, to be precise). I found out through the excellent YSlow plugin that not everything was being compressed. HTML was compressed, but not style sheets or scripts.

The nginx settings are now as follows:

gzip on;
gzip_buffers 128 8k;
gzip_types text/plain application/xml text/xml application/xml+rss text/css application/x-javascript text/javascript application/xhtml+xml;
gzip_disable "MSIE [1-6].(?!.*SV1)";

This provides a large buffer to prevent unexpected hangs. I’ve added mimetypes for javascript, RSS feeds and CSS files.

On the MySQL front, I received some advice from MySQL expert Paul Moen of Pythian.

Paul’s main advice upon looking over the MySQL settings was to increase the key_buffer variable from 8Mb. It’s currently at 96Mb. This has sped up some queries. He also advised enabling and watching MySQL’s slow query log. This helped me spot queries where indexes might speed up things, so I added a few to relevant columns which also reduced the number of slow queries. Thanks Paul.

Another change was to notice that the advice for setting connection_max in MySQL is essentially Apache/mod_php-specific. It doesn’t make sense in a FastCGI context. In FastCGI you launch a fixed number of PHP instances. The number of connections from Wordpress into MySQL is limited to the number of PHP instances; so there’s no requirement to set a higher connection_max setting. This is not an optimisation per se, but it tidies things up a bit.

I double checked my PHP opcode cacher XCache and found that it was not enabled. Annoyingly, aptitude will install but not activate this for you. To actually enable it, it’s necessary to edit the php.ini file and add a line:
extension=xcache.so
Somewhere in the file.

Previously I have added favicons. Watching the error log, I saw requests for apple-touch-icon.png. Apple, in their infinite wisdom, have decided to add Yet Another Assumed File to the list of things webmasters need to know about. I’ve gone through and added icons for each of the Ozblogistan sites. It was, I must tell you, tedious and time-consuming. Though Apple currently scale the images to 57×57, they will almost definitely upgrade this in future, so I’ve placed 128×128 icons for now.

Finally, I adjusted the ‘swappiness’ of the server. I configure PHP and MySQL to occupy less than the full RAM available; thus, I don’t want anything swapped out to disk. I’ve set swappiness to 0, which tells Linux to keep applications in memory at all times until it has no choice other than to swap them out.

Posted in Technical Notes | Leave a comment

Wordpress import/export is still pants

It still chokes on ‘large’ files due to causing massive blooms of memory consumption. The XML parser implementation used insists on loading the whole file at once. This is a recipe for PHP choking a modest VPS.

A WXR splitter (there are several) is pretty much required. Why doesn’t Wordpress perform this task itself?

It’d also be nice if it did even very basic duplicate checking.

Posted in Geeky Musings, Metablogging | Leave a comment