WordPress ETag bug
My hosting provider charges by the byte and so that motivates me to
try and keep track of my bandwidth usage. Right now, most of my
traffic comes from search engines (like MSNbot) and RSS
aggregators (like Bloglines). The former could be managed
probably by improving my URL structure and judicious instructions
in my robots.txt; the latter ultimately requires a more intelligent
dissemination mechanism, perhaps the way Usenet does things or
with something like FeedTree. However, in the interim, we rely on
the If-Modified-Since and If-None-Match HTTP headers
to ensure that polling at least only transfers data when something has
changed.
In perusing my access logs, I realized that Bloglines was always retrieving the full contents of my RSS feed, even when it hadn’t changed. Quick manual testing revealed that if only If-Modified-Since was specified, the data was correctly suppressed. However, Bloglines (rightly) uses both headers to detect changes. The problem appears to be one of quoting.
Quoting is used to escape characters that may be potentially dangerous from being interpreted: for example, the right hand side of the If-Modified-Since header is a string called the entity tag and is provided by the HTTP client (such as the Bloglines poller). There is the risk that this string could somehow be fed into a database or shell command. If this string contains characters that have special meaning to the database or shell, an attacker could use that to gain access to the system. Thus, WordPress takes care to escape dangerous characters, such as quotation marks, from the string to prevent this from happening.
Unfortunately, a change made in 2005 that handles
quoting appears to interact poorly with
send_headers, the
code that checks whether the feed has changed relative to what
the HTTP client (Bloglines) last knew about. In particular,
entity tags are quoted strings, in the sense that it
is a string of characters that appears in quotation marks.
PHP already quotes these strings (in the sense of
escaping dangerous characters), which is why send_headers took
care to stripslashes. However, the quoting introduced in change
2699 causes the quoted string to be quoted again so the
match obviously fails and the 304 response code (no change)
is not sent.
The hackish fix is to simply call stripslashes twice, which is
what I’ve done for now. The more permanent fix probably involves
something about how WordPress deals with quoting.
I wanted to submit a ticket to the WordPress
trac server but their login
database hasn’t yet been updated with my new account. I’ll update this post
with a link to the ticket when it gets created.
Update: Someone else noticed and was able to file a bug. Their comments led me to realize it was a more recent change, part of the 2.0.2 upgrade, that probably caused the problem. Where are the regression tests?
.
March 29th, 2006 at 3:49 pm
Nice catch! I tried to track the problem down, but didn’t have time to properly debug it.
March 29th, 2006 at 4:14 pm
Thanks; I hope you what to do with it :-) There’s obviously a lot of stuff to do with quoting in the WP code base, including interactions with PHP’s own magic quote stuff, and I didn’t have the energy to come up with a truly proper fix (much less a fix that would work across multiple PHP versions).
March 30th, 2006 at 2:19 pm
[...] Emil Sit » WordPress ETag bug Geof, Alex, Scot, and I have been looking into this Conditional GET bug, but Emil found the culprit first. (tags: dougal_comments wordpress rss feeds conditionalget bugs) [...]
March 30th, 2006 at 7:27 pm
[...] Emil Sit » WordPress ETag bug Geof, Alex, Scot, and I have been looking into this Conditional GET bug, but Emil found the culprit first. (tags: dougal_comments wordpress rss feeds conditionalget bugs) [...]
April 3rd, 2006 at 12:57 pm
[...] Emil Sit - WordPress ETag bug [...]
April 18th, 2006 at 3:54 pm
Good reading. I just added your website to my bookmarks list, hope to read much more interesting stuff. Keep it going!
November 22nd, 2006 at 3:10 pm
Stumbled across this trying to find more info. Since the wordpress ETag is just a hash of the last modified timestamp, so I removed the @header(”ETag: …) from classes.php and I’ll let stuff go from there.
November 24th, 2006 at 12:09 am
[...] Adding in the fact that if the client actually sends an ETagged response (the HTTP 1.1 If-None-Match header), Wordpress’s quoting mechanism screws it up. Some people have taken a stab at fixing it but it seems easier to just nuke the @header(”ETag: …) line in wp-include/classes.php (around line 1637 in wordpress 2.0.5) and let last-modified work just fine. [...]
December 5th, 2006 at 3:43 am
[...] http://www.emilsit.net/blog/archives/wordpress-etag-bug/ [...]
January 31st, 2007 at 3:22 am
[...] The other, preferred solution is to fix the processing of If-none-match headers with a quick and easy hack. In WordPress 2.0.5, open up wp-includes/classes.php and go to line 1640. Replace if (isset($SERVER['HTTPIFNONEMATCH'])) $clientetag = stripslashes($SERVER['HTTPIFNONEMATCH']); with if (isset($SERVER['HTTPIFNONEMATCH'])) $clientetag = stripslashes(stripslashes($SERVER['HTTPIFNONEMATCH'])); Now your bandwidth usage for WordPress feeds should be reduced. Don’t forget that you’ll have to do this every time you upgrade WordPress. [...]