WordPress ETag bug

My hosting provider charges by the byte and so that motivates me to try and keep track of my bandwidth usage. Right now, most of my traffic comes from search engines (like MSNbot) and RSS aggregators (like Bloglines). The former could be managed probably by improving my URL structure and judicious instructions in my robots.txt; the latter ultimately requires a more intelligent dissemination mechanism, perhaps the way Usenet does things or with something like FeedTree. However, in the interim, we rely on the If-Modified-Since and If-None-Match HTTP headers to ensure that polling at least only transfers data when something has changed.

In perusing my access logs, I realized that Bloglines was always retrieving the full contents of my RSS feed, even when it hadn’t changed. Quick manual testing revealed that if only If-Modified-Since was specified, the data was correctly suppressed. However, Bloglines (rightly) uses both headers to detect changes. The problem appears to be one of quoting.

Quoting is used to escape characters that may be potentially dangerous from being interpreted: for example, the right hand side of the If-Modified-Since header is a string called the entity tag and is provided by the HTTP client (such as the Bloglines poller). There is the risk that this string could somehow be fed into a database or shell command. If this string contains characters that have special meaning to the database or shell, an attacker could use that to gain access to the system. Thus, WordPress takes care to escape dangerous characters, such as quotation marks, from the string to prevent this from happening.

Unfortunately, a change made in 2005 that handles quoting appears to interact poorly with send_headers, the code that checks whether the feed has changed relative to what the HTTP client (Bloglines) last knew about. In particular, entity tags are quoted strings, in the sense that it is a string of characters that appears in quotation marks. PHP already quotes these strings (in the sense of escaping dangerous characters), which is why send_headers took care to stripslashes. However, the quoting introduced in change 2699 causes the quoted string to be quoted again so the match obviously fails and the 304 response code (no change) is not sent.

The hackish fix is to simply call stripslashes twice, which is what I’ve done for now. The more permanent fix probably involves something about how WordPress deals with quoting. I wanted to submit a ticket to the WordPress trac server but their login database hasn’t yet been updated with my new account. I’ll update this post with a link to the ticket when it gets created.

Update: Someone else noticed and was able to file a bug. Their comments led me to realize it was a more recent change, part of the 2.0.2 upgrade, that probably caused the problem. Where are the regression tests?



About the Author

Emil Sit is a graduate student in the Parallel and Distributed Operating Systems group at MIT's Computer Science and AI Lab.

10 Responses to “WordPress ETag bug”

  1. Dougal Campbell Says:

    Nice catch! I tried to track the problem down, but didn’t have time to properly debug it.

  2. sit Says:

    Thanks; I hope you what to do with it :-) There’s obviously a lot of stuff to do with quoting in the WP code base, including interactions with PHP’s own magic quote stuff, and I didn’t have the energy to come up with a truly proper fix (much less a fix that would work across multiple PHP versions).

  3. CVS Test Blog » Daily Links Says:

    [...] Emil Sit » WordPress ETag bug Geof, Alex, Scot, and I have been looking into this Conditional GET bug, but Emil found the culprit first. (tags: dougal_comments wordpress rss feeds conditionalget bugs) [...]

  4. geek ramblings » links for 2006-03-31 Says:

    [...] Emil Sit » WordPress ETag bug Geof, Alex, Scot, and I have been looking into this Conditional GET bug, but Emil found the culprit first. (tags: dougal_comments wordpress rss feeds conditionalget bugs) [...]

  5. alexking.org: Blog > Around the web Says:

    [...] Emil Sit - WordPress ETag bug [...]

  6. TemplateBoy Says:

    Good reading. I just added your website to my bookmarks list, hope to read much more interesting stuff. Keep it going!

  7. dbt Says:

    Stumbled across this trying to find more info. Since the wordpress ETag is just a hash of the last modified timestamp, so I removed the @header(”ETag: …) from classes.php and I’ll let stuff go from there.

  8. ((meatspace)) » Wordpress, ETag, Last-Modified Says:

    [...] Adding in the fact that if the client actually sends an ETagged response (the HTTP 1.1 If-None-Match header), Wordpress’s quoting mechanism screws it up. Some people have taken a stab at fixing it but it seems easier to just nuke the @header(”ETag: …) line in wp-include/classes.php (around line 1637 in wordpress 2.0.5) and let last-modified work just fine. [...]

  9. Aaron Johnson » Blog Archive » RSS/Atom feeds, Last Modified and Etags Says:

    [...] http://www.emilsit.net/blog/archives/wordpress-etag-bug/ [...]

  10. WordPress Feeds and Excessive Bandwidth « Brian Yi Says:

    [...] The other, preferred solution is to fix the processing of If-none-match headers with a quick and easy hack. In WordPress 2.0.5, open up wp-includes/classes.php and go to line 1640. Replace if (isset($SERVER['HTTPIFNONEMATCH'])) $clientetag = stripslashes($SERVER['HTTPIFNONEMATCH']); with if (isset($SERVER['HTTPIFNONEMATCH'])) $clientetag = stripslashes(stripslashes($SERVER['HTTPIFNONEMATCH'])); Now your bandwidth usage for WordPress feeds should be reduced. Don’t forget that you’ll have to do this every time you upgrade WordPress. [...]

Learn More

Find posts with tags: , ,

Other posts in Hacking: Reduce your context switch delay, Clean up a Twitter feed with a Yahoo Pipe, Saving bandwidth with DokuWiki, Design priorities for performancing.com, WordPress ETag bug