Brian’s Latest Comments doesn’t scale.

Or, to be fair, MySQL 3.x is rubbish. But you knew that already.

Most of the blogs on the Ozblogistan network use BLC. Recently, Larvatus Prodeo came on board and expected it to work for them too. But it simply hung without giving much in the way of error message. I pottered around with it at the time without much success.

Today it occurred to me that the problem is the way BLC works. BLC aims at being compatible with MySQL 3.x. Among other things, this means giving up subqueries and having piss-poor SELECT DISTINCT.

So BLC issues this beauty of a query:

SELECT comment_post_ID, post_title
FROM (wp_comments LEFT JOIN wp_posts ON (comment_post_ID = ID))
WHERE comment_approved = '1'
AND wp_posts.post_status='publish'
AND comment_type'pingback'
AND comment_type'trackback'
ORDER BY comment_date DESC;

This query pulls out the post ID and post_title for every approved, published comment in the database. On a site like Larvatus Prodeo, the results run to ~165k lines. BLC pulls these results into PHP as objects, one for each line; then it merges down the results to determine which posts have comments.

As you can imagine, this is memory intensive. Very memory intensive. And it’s a bit brutal on the CPU too, especially when garbage collection occurs. The upshot is that on Larvatus Prodeo, BLC exceeds the memory limits for PHP, and that for other sites, it adds about between 600 and 1200 milliseconds of processing time. Not cool.

Naturally, I’m running MySQL 5.x on my server, so SELECT DISTINCT works ‘as advertised’. So I’ve implemented a super high tech optimisation on BLC:


SELECT DISTINCT comment_post_ID, post_title
FROM (wp_comments LEFT JOIN wp_posts ON (comment_post_ID = ID))
WHERE comment_approved = '1'
AND wp_posts.post_status='publish'
AND comment_type'pingback'
AND comment_type'trackback'
ORDER BY comment_date DESC;

While this query is slower than the original, it basically returns only the list of posts with comments on them. On Larvatus Prodeo, this is a much more manageable ~2.5k lines of results. The reduction in PHP runtime completely overshadows the increase in query time; and besides, the smaller result sets don’t clog up the MySQL query cache.

So there you have it: a one-word way to dramatically improve the performance of Brian’s Latest Comments on MySQL 5.x-backed WordPress.

Update: Two additional optimisations are worthy. First, add a LIMIT clause to your copy of BLC along the lines of:

$posts = $wpdb->get_results("SELECT DISTINCT
comment_post_ID, post_title
FROM ($wpdb->comments LEFT JOIN $wpdb->posts ON (comment_post_ID = ID))
WHERE comment_approved = '1'
AND $wpdb->posts.post_status='publish'
$ping
ORDER BY comment_date DESC
LIMIT $num_posts;");

Second, the SELECT DISTINCT hammers the tables much harder, so add indexes for the relevant fields:


CREATE INDEX post_status_index ON wp_posts(post_status);
CREATE INDEX comment_date_index ON wp_comments(comment_date);
CREATE INDEX comment_type_index ON wp_comments(comment_type);
CREATE INDEX comment_date_approved_index ON wp_comments(comment_date_gmt);
CREATE INDEX post_title_index ON wp_posts(post_title(50));

Update 2: The query above doesn’t order the results according to the time of the latest comments; it winds up ordering by posting date. Not what my users want. Try this instead:

$posts = $wpdb->get_results("SELECT comment_post_ID, post_title, max(comment_date) AS max_date
FROM ($wpdb->comments LEFT JOIN $wpdb->posts ON (comment_post_ID = ID))
WHERE comment_approved = '1'
AND $wpdb->posts.post_status='publish'
$ping
GROUP BY post_title
ORDER BY max_date DESC
LIMIT $num_posts;");

Update 3: Nope. Performance is still atrocious. I’ll revisit this in a few weeks when I’ve settled in Darwin.

This entry was posted in Technical Notes. Bookmark the permalink.

13 Responses to Brian’s Latest Comments doesn’t scale.

  1. Paul says:

    Try adding a covering index on COMMENTS to cover all
    columns returned.

    CREATE INDEX ix_comment_dt ON wp_comments(comment_post_ID, post_title,comment_approved,comment_date DESC);

    Check the explain plan, if the temp table and file sort is not removed due to sorting by MAX(comment_date) not just comment_date. You may have to revert to comment_date and wrap the whole thing as a derived table.
    ala.
    select a.* from (select comment… comments join posts … order by comment_date DESC) order by max(comment_date)

    The hard yards in reducing the dataset are done within the derived table, the sorting on column which can’t be determined until runtime is delayed until the rowset is small enough not to be a performance problem.

  2. Nick says:

    Hi Jacques,

    The comments table is much too large to attempt a JOIN like they’ve used in the original code. Reduce the data down as much as possible first via temp tables (in this case, to just bare IDs = ideal, will have no problem staying in memory), and only then do the JOIN.

    Try the following…it’s running on my machine with 10000 mocked up posts, 1000 lengthy comments per post, and takes a fraction of a second:

    DELIMITER $$
    CREATE PROCEDURE get_latest_posts_commented_on`($num_posts_to_get INT)
    BEGIN

    CREATE TEMPORARY TABLE temp_unpublished_posts (id INT) ENGINE=MEMORY;
    CREATE INDEX temp_unpublished_posts_id_index ON temp_unpublished_posts (id);
    INSERT INTO temp_unpublished_posts (id)
    SELECT id
    FROM wp_posts
    WHERE post_status ‘publish’;

    CREATE TEMPORARY TABLE temp_latest_posts_commented_on (comment_post_ID INT) ENGINE=MEMORY;
    INSERT INTO temp_latest_posts_commented_on (comment_post_ID)
    SELECT DISTINCT comment_post_ID
    FROM (wp_comments)
    WHERE comment_approved = ‘1’
    AND comment_type ‘pingback’
    AND comment_type ‘trackback’
    AND comment_post_ID NOT IN (SELECT id FROM temp_unpublished_posts)
    ORDER BY comment_date DESC
    LIMIT $num_posts_to_get;

    SELECT comment_post_ID, post_title
    FROM temp_latest_posts_commented_on
    INNER JOIN wp_posts ON wp_posts.id = comment_post_ID;

    DROP TABLE temp_unpublished_posts;
    DROP TABLE temp_latest_posts_commented_on;

    END

    In brianslatestcomments.php, replace the line:

    $posts = $wpdb->get_results(“SELECT
    comment_post_ID, post_title
    FROM ($wpdb->comments LEFT JOIN $wpdb->posts ON (comment_post_ID = ID))
    WHERE comment_approved = ‘1’
    AND $wpdb->posts.post_status=’publish’
    $ping
    ORDER BY comment_date DESC;”);

    with:

    $posts = $wpdb->get_results(“CALL get_latest_posts_commented_on ($num_posts);”);

    Cheers,
    Nick

  3. Nick says:

    WHERE post_status ‘publish’;

    Should read:

    WHERE post_status <> ‘publish’;

    (forgot to escape:)

  4. Jacques Chester says:

    Looks good, Nick, I hadn’t thought of using memory tables (I’d thought of intermediate temps, but hadn’t joined all the dots).

    I might need to tweak it some more to fit the multisite schema (where there’s a set of tables for each blog).

  5. Nick says:

    Hi Jacques, I just sent a revised version through but hasn’t appeared…let me know if it didn’t come through, and can send again.

  6. Jacques Chester says:

    Where’d you send it to?

  7. Nick says:

    DELIMITER $$

    CREATE PROCEDURE `get_latest_posts_commented_on_multisite`($site_num VARCHAR(4), $num_posts_to_get INT)
    BEGIN

    CREATE TEMPORARY TABLE temp_unpublished_posts (id INT) ENGINE=MEMORY;
    CREATE INDEX temp_unpublished_posts_id_index ON temp_unpublished_posts (id);

    SET @sql_str_1 := CONCAT (
    ‘INSERT INTO temp_unpublished_posts (id) ‘
    , ‘SELECT id ‘
    , ‘FROM wp_’, $site_num, ‘posts ‘
    , ‘WHERE post_status ’publish’;’
    );

    PREPARE sql_str_1 FROM @sql_str_1;
    EXECUTE sql_str_1;
    DEALLOCATE PREPARE sql_str_1;

    CREATE TEMPORARY TABLE temp_latest_posts_commented_on (comment_post_ID INT) ENGINE=MEMORY;

    SET @sql_str_2 := CONCAT (
    ‘ INSERT INTO temp_latest_posts_commented_on (comment_post_ID) ‘
    , ‘SELECT DISTINCT comment_post_ID ‘
    , ‘FROM (wp_’, $site_num, ‘comments) ‘
    , ‘WHERE comment_approved = ”1” ‘
    , ‘AND comment_type ”pingback” ‘
    , ‘AND comment_type ”trackback” ‘
    , ‘AND comment_post_ID NOT IN (SELECT id FROM temp_unpublished_posts) ‘
    , ‘ORDER BY comment_date DESC ‘
    , ‘LIMIT ‘, $num_posts_to_get
    );

    PREPARE sql_str_2 FROM @sql_str_2;
    EXECUTE sql_str_2;

    SET @sql_str_3 := CONCAT (
    ‘SELECT comment_post_ID, post_title ‘
    , ‘FROM temp_latest_posts_commented_on ‘
    , ‘INNER JOIN wp_’, $site_num, ‘posts ON wp_posts.id = comment_post_ID; ‘
    );

    PREPARE sql_str_3 FROM @sql_str_3;
    EXECUTE sql_str_3;

    DROP TABLE temp_unpublished_posts;
    DROP TABLE temp_latest_posts_commented_on;

    END

    In brianslatestcomments.php, replace the line:

    $posts = $wpdb->get_results(“SELECT
    comment_post_ID, post_title
    FROM ($wpdb->comments LEFT JOIN $wpdb->posts ON (comment_post_ID = ID))
    WHERE comment_approved = ’1?
    AND $wpdb->posts.post_status=’publish’
    $ping
    ORDER BY comment_date DESC;”);

    with:

    $posts = $wpdb->get_results(“CALL get_latest_posts_commented_on_multisite (“2_”, $num_posts);”);

    etc…

  8. Nick says:

    Something like that?

    A little bit clumsy, passing through the varchar instead of an int, but meant I could run the script with my non-multiple site version (by passing through ” for @site_num)…

  9. Jacques Chester says:

    WordPress allows you to look up table names from within PHP using the wpdb object. My inclination would be to use that to get the table name directly.

    Can queries in stored procs be made parametric in MySQL? The string concatenation approach is making me uneasy.

  10. Nick says:

    I don’t think so, Jacques – or, sorry, yes, but not in a way that allows passing table names as paramaters.

    Because of the nature of the stored proc compiler, the only way to make use of dynamic table/column/index/any other db object names (never an ideal practice, for this reason and others) is to string build an entire query within the code, and then EXEC it.

    Unless I’m missing something, it shouldn’t really worry you. It looks messier, but it’s essentially no different to what takes place when you build your sql string in your PHP code, and then send it off to the db to compile/execute on the fly…

    Out of interest, is there a reason you don’t have a separate schema for each site, rather than sharing one schema between them all?

  11. Nick says:

    Paramaters – a kind of town you only find in Sydney!

  12. Jacques Chester says:

    Nick;

    If you’re referring to the wp_1_blah, wp_2_blah stuff, that’s inherited from how WordPress.com do it, which was merged into the mainline of WordPress-the-software. Essentially it’s a very simple sharding scheme that serves their purposes.

  13. Nick says:

    Ah, ok. Yeah, as far as I know, no way around the concatenating in that case.

Comments are closed.