WordPress ETag bug

My hosting provider charges by the byte and so that motivates me to try and keep track of my bandwidth usage. Right now, most of my traffic comes from search engines (like MSNbot) and RSS aggregators (like Bloglines). The former could be managed probably by improving my URL structure and judicious instructions in my robots.txt; the latter ultimately requires a more intelligent dissemination mechanism, perhaps the way Usenet does things or with something like FeedTree. However, in the interim, we rely on the If-Modified-Since and If-None-Match HTTP headers to ensure that polling at least only transfers data when something has changed.

In perusing my access logs, I realized that Bloglines was always retrieving the full contents of my RSS feed, even when it hadn’t changed. Quick manual testing revealed that if only If-Modified-Since was specified, the data was correctly suppressed. However, Bloglines (rightly) uses both headers to detect changes. The problem appears to be one of quoting.

Quoting is used to escape characters that may be potentially dangerous from being interpreted: for example, the right hand side of the If-Modified-Since header is a string called the entity tag and is provided by the HTTP client (such as the Bloglines poller). There is the risk that this string could somehow be fed into a database or shell command. If this string contains characters that have special meaning to the database or shell, an attacker could use that to gain access to the system. Thus, WordPress takes care to escape dangerous characters, such as quotation marks, from the string to prevent this from happening.

Unfortunately, a change made in 2005 that handles quoting appears to interact poorly with send_headers, the code that checks whether the feed has changed relative to what the HTTP client (Bloglines) last knew about. In particular, entity tags are quoted strings, in the sense that it is a string of characters that appears in quotation marks. PHP already quotes these strings (in the sense of escaping dangerous characters), which is why send_headers took care to stripslashes. However, the quoting introduced in change 2699 causes the quoted string to be quoted again so the match obviously fails and the 304 response code (no change) is not sent.

The hackish fix is to simply call stripslashes twice, which is what I’ve done for now. The more permanent fix probably involves something about how WordPress deals with quoting. I wanted to submit a ticket to the WordPress trac server but their login database hasn’t yet been updated with my new account. I’ll update this post with a link to the ticket when it gets created.

Update: Someone else noticed and was able to file a bug. Their comments led me to realize it was a more recent change, part of the 2.0.2 upgrade, that probably caused the problem. Where are the regression tests?

4 Comments

  1. Posted 29 March 2006 at 15:49 | Permalink

    Nice catch! I tried to track the problem down, but didn’t have time to properly debug it.

  2. sit
    Posted 29 March 2006 at 16:14 | Permalink

    Thanks; I hope you what to do with it :-) There’s obviously a lot of stuff to do with quoting in the WP code base, including interactions with PHP’s own magic quote stuff, and I didn’t have the energy to come up with a truly proper fix (much less a fix that would work across multiple PHP versions).

  3. Posted 18 April 2006 at 15:54 | Permalink

    Good reading. I just added your website to my bookmarks list, hope to read much more interesting stuff. Keep it going!

  4. Posted 22 November 2006 at 15:10 | Permalink

    Stumbled across this trying to find more info. Since the wordpress ETag is just a hash of the last modified timestamp, so I removed the @header(”ETag: …) from classes.php and I’ll let stuff go from there.

6 Trackbacks

  1. By CVS Test Blog » Daily Links on 30 March 2006 at 14:19

    [...] Emil Sit » WordPress ETag bug Geof, Alex, Scot, and I have been looking into this Conditional GET bug, but Emil found the culprit first. (tags: dougal_comments wordpress rss feeds conditionalget bugs) [...]

  2. By geek ramblings » links for 2006-03-31 on 30 March 2006 at 19:27

    [...] Emil Sit » WordPress ETag bug Geof, Alex, Scot, and I have been looking into this Conditional GET bug, but Emil found the culprit first. (tags: dougal_comments wordpress rss feeds conditionalget bugs) [...]

  3. By alexking.org: Blog > Around the web on 3 April 2006 at 12:57

    [...] Emil Sit - WordPress ETag bug [...]

  4. By ((meatspace)) » Wordpress, ETag, Last-Modified on 24 November 2006 at 00:09

    [...] Adding in the fact that if the client actually sends an ETagged response (the HTTP 1.1 If-None-Match header), Wordpress’s quoting mechanism screws it up. Some people have taken a stab at fixing it but it seems easier to just nuke the @header(”ETag: …) line in wp-include/classes.php (around line 1637 in wordpress 2.0.5) and let last-modified work just fine. [...]

  5. [...] http://www.emilsit.net/blog/archives/wordpress-etag-bug/ [...]

  6. [...] The other, preferred solution is to fix the processing of If-none-match headers with a quick and easy hack. In WordPress 2.0.5, open up wp-includes/classes.php and go to line 1640. Replace if (isset($SERVER['HTTPIFNONEMATCH'])) $clientetag = stripslashes($SERVER['HTTPIFNONEMATCH']); with if (isset($SERVER['HTTPIFNONEMATCH'])) $clientetag = stripslashes(stripslashes($SERVER['HTTPIFNONEMATCH'])); Now your bandwidth usage for WordPress feeds should be reduced. Don’t forget that you’ll have to do this every time you upgrade WordPress. [...]