28 September 2004

WordPress Filters Slowly


For a while now I’ve noticed that the front page on my site has kind of a long load time. It’s visible at the bottom of the page source and tends to be just over a second. Serving one page a second is not really acceptable for any sort of load, the server would peg its CPU before managing to saturate even the measily DSL line we serve from.

After wrestling with the mess that is PHP in the FreeBSD ports system, I got a profiling tool installed to see what was going on. The results aren’t too pretty.

         Real         User        System             secs/    cumm
%Time (excl/cumm) (excl/cumm) (excl/cumm) Calls call s/call Memory Usage Name
--------------------------------------------------------------------------------------
100.0 0.00 1.64 0.00 1.21 0.00 0.19 1 0.0001 1.6418 0 main
56.0 0.00 0.92 0.00 0.76 0.00 0.06 46 0.0001 0.0200 0 apply_filters
52.7 0.00 0.87 0.00 0.71 0.00 0.05 5 0.0003 0.1732 0 the_content
49.2 0.00 0.81 0.00 0.66 0.00 0.05 5 0.0000 0.1616 0 textile
39.1 0.64 0.64 0.62 0.62 0.02 0.02 705 0.0009 0.0009 0 preg_replace

The Textile filter is taking a whopping 0.81 wall clock seconds for the 5 calls to it. It is also responsible for a significant number of the preg_replace calls. The solution to this is not to filter every time you display something, but to save it pre-filtered first. There is even a nice database column “post_content_filtered” for this use. Too bad it’s not used right now.

Given the nasty performance characteristics of these WordPress filters it’s a small wonder that blog sites tend to be easily “slashdotted” off the air. While one should certainly avoid premature optimization, this seems like a place where optimization is a bit overdue. Nothing like one more pet project…

Update

  1. Save a copy of the filtered text to “post_content_filtered” on each saved change to the post

  2. Rebuild this cached filtered text if “post_content_filtered” is empty when we look at it

  3. Provide a mechanism to invalidate the cache, or in simpler terms, empty the filtered copies so the next time they’re looked at they’ll be refiltered by the previous item.

This approach seems to be level with suggestions and at least one other implementation. Deep in my heart I wish the trigger was if post_content_filtered was “null” in the SQL sense rather than just and empty string, but that would require a schema change and probably won’t be accepted by the WordPress folks. The main integration concern I think is going to be properly filtering for some uses of “the_content” and not others. There will probably be a “the_content_static_filter” or some such for things that want to be processed and added to the cache and then the late normal filter like the existing one for everything else.

Interaction with the RSS feed and comments should probably be addressed here at some point, but I imagine I’ll figure that out as I go along. I already know that the Textile2 filter is not being applied to the RSS feed, so there maybe some adapting in order to fix that anyway. I have to admit I’m really surprised by the lack of commentary on this issue when I did some cursory searching online. Either my search terms were poor, or typical bloggers don’t really care about performance much.


No comments: