Technical Notes

An XCache ‘gotcha’

XCache is a PHP opcode cacher. This is good and well as it speeds up pageloads by removing PHP parsing overhead. On the downside, you must remember to restart your PHP processes after changing PHP files, as XCache doesn’t seem to perform stat()s on the files to check if they’ve been updated.

Technical Notes

Comments (0)

Permalink

Brian’s Latest Comments doesn’t scale.

Or, to be fair, MySQL 3.x is rubbish. But you knew that already.

Most of the blogs on the Ozblogistan network use BLC. Recently, Larvatus Prodeo came on board and expected it to work for them too. But it simply hung without giving much in the way of error message. I pottered around with it at the time without much success.

Today it occurred to me that the problem is the way BLC works. BLC aims at being compatible with MySQL 3.x. Among other things, this means giving up subqueries and having piss-poor SELECT DISTINCT.

So BLC issues this beauty of a query:
SELECT comment_post_ID, post_title
FROM (wp_comments LEFT JOIN wp_posts ON (comment_post_ID = ID))
WHERE comment_approved = '1'
AND wp_posts.post_status='publish'
AND comment_type<>'pingback'
AND comment_type<>'trackback'
ORDER BY comment_date DESC;

This query pulls out the post ID and post_title for every approved, published comment in the database. On a site like Larvatus Prodeo, the results run to ~165k lines. BLC pulls these results into PHP as objects, one for each line; then it merges down the results to determine which posts have comments.

As you can imagine, this is memory intensive. Very memory intensive. And it’s a bit brutal on the CPU too, especially when garbage collection occurs. The upshot is that on Larvatus Prodeo, BLC exceeds the memory limits for PHP, and that for other sites, it adds about between 600 and 1200 milliseconds of processing time. Not cool.

Naturally, I’m running MySQL 5.x on my server, so SELECT DISTINCT works ‘as advertised’. So I’ve implemented a super high tech optimisation on BLC:

SELECT DISTINCT comment_post_ID, post_title
FROM (wp_comments LEFT JOIN wp_posts ON (comment_post_ID = ID))
WHERE comment_approved = '1'
AND wp_posts.post_status='publish'
AND comment_type<>'pingback'
AND comment_type<>'trackback'
ORDER BY comment_date DESC;

While this query is slower than the original, it basically returns only the list of posts with comments on them. On Larvatus Prodeo, this is a much more manageable ~2.5k lines of results. The reduction in PHP runtime completely overshadows the increase in query time; and besides, the smaller result sets don’t clog up the MySQL query cache.

So there you have it: a one-word way to dramatically improve the performance of Brian’s Latest Comments on MySQL 5.x-backed WordPress.

Update: Two additional optimisations are worthy. First, add a LIMIT clause to your copy of BLC along the lines of:

$posts = $wpdb->get_results("SELECT DISTINCT
comment_post_ID, post_title
FROM ($wpdb->comments LEFT JOIN $wpdb->posts ON (comment_post_ID = ID))
WHERE comment_approved = '1'
AND $wpdb->posts.post_status='publish'
$ping
ORDER BY comment_date DESC
LIMIT $num_posts;");

Second, the SELECT DISTINCT hammers the tables much harder, so add indexes for the relevant fields:

CREATE INDEX post_status_index ON wp_posts(post_status);
CREATE INDEX comment_date_index ON wp_comments(comment_date);
CREATE INDEX comment_type_index ON wp_comments(comment_type);
CREATE INDEX comment_date_approved_index ON wp_comments(comment_date_gmt);
CREATE INDEX post_title_index ON wp_posts(post_title(50));

Update 2: The query above doesn’t order the results according to the time of the latest comments; it winds up ordering by posting date. Not what my users want. Try this instead:

$posts = $wpdb->get_results("SELECT comment_post_ID, post_title, max(comment_date) AS max_date
FROM ($wpdb->comments LEFT JOIN $wpdb->posts ON (comment_post_ID = ID))
WHERE comment_approved = '1'
AND $wpdb->posts.post_status='publish'
$ping
GROUP BY post_title
ORDER BY max_date DESC
LIMIT $num_posts;");

Update 3: Nope. Performance is still atrocious. I’ll revisit this in a few weeks when I’ve settled in Darwin.

Technical Notes

Comments (1)

Permalink

More tweaking for Ozblogistan

I did some minor optimisations on Ozblogistan last weekend.

I have HTTP compression enabled on the webserver (gzip, to be precise). I found out through the excellent YSlow plugin that not everything was being compressed. HTML was compressed, but not style sheets or scripts.

The nginx settings are now as follows:

gzip on;
gzip_buffers 128 8k;
gzip_types text/plain application/xml text/xml application/xml+rss text/css application/x-javascript text/javascript application/xhtml+xml;
gzip_disable "MSIE [1-6]\.(?!.*SV1)";

This provides a large buffer to prevent unexpected hangs. I’ve added mimetypes for javascript, RSS feeds and CSS files.

On the MySQL front, I received some advice from MySQL expert Paul Moen of Pythian.

Paul’s main advice upon looking over the MySQL settings was to increase the key_buffer variable from 8Mb. It’s currently at 96Mb. This has sped up some queries. He also advised enabling and watching MySQL’s slow query log. This helped me spot queries where indexes might speed up things, so I added a few to relevant columns which also reduced the number of slow queries. Thanks Paul.

Another change was to notice that the advice for setting connection_max in MySQL is essentially Apache/mod_php-specific. It doesn’t make sense in a FastCGI context. In FastCGI you launch a fixed number of PHP instances. The number of connections from WordPress into MySQL is limited to the number of PHP instances; so there’s no requirement to set a higher connection_max setting. This is not an optimisation per se, but it tidies things up a bit.

I double checked my PHP opcode cacher XCache and found that it was not enabled. Annoyingly, aptitude will install but not activate this for you. To actually enable it, it’s necessary to edit the php.ini file and add a line:
extension=xcache.so
Somewhere in the file.

Previously I have added favicons. Watching the error log, I saw requests for apple-touch-icon.png. Apple, in their infinite wisdom, have decided to add Yet Another Assumed File to the list of things webmasters need to know about. I’ve gone through and added icons for each of the Ozblogistan sites. It was, I must tell you, tedious and time-consuming. Though Apple currently scale the images to 57×57, they will almost definitely upgrade this in future, so I’ve placed 128×128 icons for now.

Finally, I adjusted the ‘swappiness’ of the server. I configure PHP and MySQL to occupy less than the full RAM available; thus, I don’t want anything swapped out to disk. I’ve set swappiness to 0, which tells Linux to keep applications in memory at all times until it has no choice other than to swap them out.

Technical Notes

Comments (0)

Permalink

An nginx/PHP gotcha

If you are allowing larger-than-default files to be uploaded to an nginx server with PHP FCGI, you need to alter both the php.ini and the nginx.conf.

In particular, for nginx.conf, you need to use the client_max_body_size directive to set the permissible maximum upload size.

Technical Notes

Comments (0)

Permalink

Automated backups for Ozblogistan

It’s a cliche amongst nerds that everyone preaches automated backups, and very few have it.

Partly because it’s surprisingly fiddly to set up “right”. And even fiddlier to do restoration testing.

Still, one thing at a time. I’ve just now finished putting together an automatic backup regime for Ozblogistan, the server I run hosting Skepticlawyer and Andrew Norton (and, at some time in the future, some other sites too).

Herewith my notes.

Tarsnap

I am using the tarsnap backup service. I am satisfied that it’s cheap, efficient, reliable and secure. The trickiest part to wrap my modest brain around is its snapshot-based nature. You don’t follow the full-and-incremental model with tarsnap: you simply list what you want to backup and let it sort out the details of the most efficient way to store that for you.

I have a very simple shell script which cron runs each day:

#!/bin/bash
# Quick and dirty script to backup files, settings and the database.

DATE=`date +%Y-%m-%d`
SQL_FILE="/var/backups/mysqldump/ozblogistan-wordpressmu-$DATE.sql"
BACKUP_ARCHIVE_NAME="ozblogistan-$DATE"

# Dump SQL
mysqldump --defaults-extra-file=/etc/tarsnap/mysqldump.cnf -eltn --dump-date --default-character-set="latin1" wordpressmu > $SQL_FILE

# Perform tarsnap backup
tarsnap -c -f $BACKUP_ARCHIVE_NAME --exclude *cache* --exclude *.svn* /home /etc /var/www/wordpressmu /var/backups/mysqldump

# Delete mysqldump

rm -f $SQL_FILE

This script performs a mysqldump of the database. It tells tarsnap to back that up in addition to /etc, /home and /var/www/wordpressmu, ignoring cache directories and SVN directories. It creates a “new” tarsnap archive each day; in practice tarsnap will only send a delta of the SQL plus any new files uploaded to the WordPress installation.

I’ve chosen not to compress the SQL, as I am unsure whether that will interfere with the delta process used by tarsnap. Compression can change the layout of a file.

Mysqldump settings

The mysqldump commandline in this shell script has two key features to note. Firstly, the use of the –default-character-set option, necessary to circumvent MySQL retardation. Secondly, the use of the –defaults-extra-file option to import settings from a custom configuration file.

The custom configuration file contains the username and password of a particular backup user. This backup user is distinct from the user supplied to WordPress Mu to access the wordpressmu database. The backup user has very limited privileges: essentially it can read but not alter the database, which is all the permissions it needs (ie. it has SELECT, LOCK TABLES on the database and RELOAD globally).

Placing those identifying details in a configuration file means that they will not be visible to someone running top, ps et al. A very modest security improvement, I grant you, but still.

Still to do

I still need to set up something similar for troppo. We have automated backups there but they’re not as efficient or reliable. Ideally I’d prefer to move troppo to the new server, but that may not be possible. This is done.

I also need to develop an automatic restoration testing facility. It’s common to think you have “flawless” backups, then discover after disaster that your backups were no good.

Finally I need to add code to delete old archives. Tarsnap’s snapshotting model is very efficient and the rate of change slow, so for now I will leave it open-ended to get a sense of how far back I can keep backups.

Technical Notes

Comments (0)

Permalink

Moving a database from WordPress to WordPress Mu

Some rough notes, before I forgot what I did.

mysqldump --default-character-set="latin1" -elt database_name | sed 's/wp_/wp_{new blogs id}_/' > database_name.sql

–default-character-set is required because of MySQL’s unrestrained enthusiasm for fucking up encoding by assuming everything is encoded in utf8. Would it kill them to have mysqldump simply look up the encoding first? Apparently the answer is “yes”.

Speaking of which, you need to change the my.cnf file to default to utf8, and tell your webserver to serve utf8, AND tell PHP to default to utf8. Then the content will come across OK.

Zip the dumpfile, download, upload. Then play to the mysql line program.

mysql -u root -p target_database < database_name.sql

Be prepared for warts. In particular older installations of WordPress carry _category columns on some tables which will trip up the mysql insert. The solution is to drop those columns from the source database first.

You also need to update the URLs. For reasons inscrutable to mere mortals, WordPress Mu doesn’t store uploaded files in wp-content/uploads. Instead it changes that to wp-content/files. This serves no purpose, so far as I can tell, but it does force me to make updates to the database:

update wp_{newblogid}_posts set post_content = replace (post_content, '/wp-content/uploads/', '/wp-content/files/') where post_content like '%/uploads/%';

update wp_{newblogid}_posts set guid = replace (guid, '/wp-content/uploads/', '/wp-content/files/') where guid like '%/uploads/%';

Edit: commands used to move skepticlawyer.
# Change user ids and import

update wp_usermeta set user_id = 6 where user_id = 4;
update wp_users set ID = 6 where ID = 4;

mysqldump --default-character-set="latin1" --skip-opt --insert-ignore -elt skepticlawyer wp_usermeta | sed 's/wp_/wp_3_/' > sl.usermeta.sql
mysqldump --default-character-set="latin1" --skip-opt --insert-ignore -elt skepticlawyer wp_users | sed 's/wp_/wp_3_/' > sl.users.sql

# Update user ids on posts and comments
update wp_posts set post_author = 6 where post_author = 4;
update wp_comments set user_id = 6 where user_id = 4;

# Update URLs for files and attachments
update wp_posts set post_content = replace (post_content, '/wp-content/uploads/', '/wp-content/files/') where post_content like '%/uploads/%';
update wp_posts set guid = replace (guid, '/wp-content/uploads/', '/wp-content/files/') where guid like '%/uploads/%';

# Dump tables
mysqldump --default-character-set="latin1" --skip-opt --insert-ignore -elt skepticlawyer wp_comments | sed 's/wp_/wp_3_/' > sl.comments.sql
mysqldump --default-character-set="latin1" --skip-opt --insert-ignore -elt skepticlawyer wp_links | sed 's/wp_/wp_3_/' > sl.links.sql

mysqldump --default-character-set="latin1" --skip-opt --insert-ignore -elt skepticlawyer wp_postmeta | sed 's/wp_/wp_3_/' > sl.postmeta.sql
mysqldump --default-character-set="latin1" --skip-opt --insert-ignore -elt skepticlawyer wp_posts | sed 's/wp_/wp_3_/' > sl.posts.sql
mysqldump --default-character-set="latin1" --skip-opt --insert-ignore -elt skepticlawyer wp_term_relationships | sed 's/wp_/wp_3_/' > sl.term_relationships.sql
mysqldump --default-character-set="latin1" --skip-opt --insert-ignore -elt skepticlawyer wp_term_taxonomy | sed 's/wp_/wp_3_/' > sl.term_taxonomy.sql
mysqldump --default-character-set="latin1" --skip-opt --insert-ignore -elt skepticlawyer wp_terms | sed 's/wp_/wp_3_/' > sl.terms.sql
mysqldump --default-character-set="latin1" --skip-opt --insert-ignore -elt skepticlawyer wp_options | sed 's/wp_/wp_3_/' > sl.options.sql

Technical Notes

Comments (1)

Permalink