<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0"
    xmlns:dc="http://purl.org/dc/elements/1.1/"
     xmlns:admin="http://webns.net/mvcb/"
     xmlns:content="http://purl.org/rss/1.0/modules/content/"
     xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
	<channel> 

	<title>Comments on: Iraqfilter</title>
	<link>http://www.metafilter.com/29202/Iraqfilter/</link>
	<description>Comments on MetaFilter post Iraqfilter</description>
	<pubDate>Mon, 27 Oct 2003 14:41:52 -0800</pubDate>
	<lastBuildDate>Mon, 27 Oct 2003 14:41:52 -0800</lastBuildDate>
	<language>en-us</language>
	<docs>http://blogs.law.harvard.edu/tech/rss</docs>
	<ttl>60</ttl>

	<item>
		<title>Iraqfilter</title>
		<link>http://www.metafilter.com/29202/Iraqfilter</link>	
		<description>&lt;a href="http://weblog.siliconvalley.com/column/dangillmor/archives/001450.shtml"&gt;Iraqfilter.&lt;/a&gt; &lt;em&gt;&quot;Sometime between April 2003 and October 2003, someone at the White House added virtually all of the directories with &apos;Iraq&apos; in them to its robots.txt file, meaning that search engines would no longer list those pages in results or archive them.&quot;&lt;/em&gt;  &lt;a href=&apos;http://www.whitehouse.gov/robots.txt&apos;&gt;The robots.txt file is here. &lt;/a&gt;And here&apos;s the &lt;a href=&apos;http://yro.slashdot.org/yro/03/10/27/2052228.shtml?tid=103&amp;tid=126&amp;tid=95&amp;tid=99&apos;&gt;Slashdot&lt;/a&gt; discussion.  I guess it&apos;s hard to restore integrity to the Presidency when people can compare your statements over time.</description>
		<guid isPermaLink="false">post:www.metafilter.com,2003:site.29202</guid>
		<pubDate>Mon, 27 Oct 2003 14:30:44 -0800</pubDate>
		<dc:creator>condour75</dc:creator>		<category>iraq</category>		<category>search</category>		<category>searchengines</category>		<category>bots</category>		<category>slashdot</category>
	</item>	<item>
		<title>By: bz</title>
		<link>http://www.metafilter.com/29202/Iraqfilter#575046</link>	
		<description>Yikes. That looks bad regardless of the reasoning.</description>
		<guid isPermaLink="false">comment:www.metafilter.com,2003:site.29202-575046</guid>
		<pubDate>Mon, 27 Oct 2003 14:41:52 -0800</pubDate>
		<dc:creator>bz</dc:creator>
	</item>	<item>
		<title>By: specialk420</title>
		<link>http://www.metafilter.com/29202/Iraqfilter#575050</link>	
		<description>one would imagine

&lt;a href=&quot;http://www.washingtonpost.com/ac2/wp-dyn/A1996-2002Feb12?language=printer&quot;&gt;/cakewalk &lt;/a&gt;

got deleted sometime back....</description>
		<guid isPermaLink="false">comment:www.metafilter.com,2003:site.29202-575050</guid>
		<pubDate>Mon, 27 Oct 2003 14:43:32 -0800</pubDate>
		<dc:creator>specialk420</dc:creator>
	</item>	<item>
		<title>By: twiggy</title>
		<link>http://www.metafilter.com/29202/Iraqfilter#575052</link>	
		<description>One wonders if anyone out there is saving a copy of this robots.txt file for future comparison.. hmmm...

Seriously though, this disturbs the shit out of me.. and what disturbs me more is the fact that the first coworker I showed this to said &quot;well, I dunno.. it seems some of that stuff is understandable..&quot;

Attempting to deny the public the right to cache this stuff both 1) for history, and 2) for before/after comparison is at the very least &quot;a bit shady&quot;, and more likely &quot;quite shady&quot;...

I cannot think of a single valid reason to block search engines and other &quot;bots&quot; from caching historical copies of &lt;b&gt;public&lt;/b&gt; documents on the whitehouse&apos;s website regardless of subject matter.

Throw in that everything pertains to &quot;iraq&quot;, and that&apos;s just plain scary...  I&apos;m no conspiracy theorist, but it doesn&apos;t take a radical nutball to notice they&apos;ve already gone back and edited the content of bush speeches, etc...</description>
		<guid isPermaLink="false">comment:www.metafilter.com,2003:site.29202-575052</guid>
		<pubDate>Mon, 27 Oct 2003 14:46:01 -0800</pubDate>
		<dc:creator>twiggy</dc:creator>
	</item>	<item>
		<title>By: the fire you left me</title>
		<link>http://www.metafilter.com/29202/Iraqfilter#575057</link>	
		<description>Interesting.  But wait, is Google keeping people from a history of preference?

&lt;a href=&quot;http://www.google.com/robots.txt&quot;&gt;http://www.google.com/robots.txt&lt;/a&gt;</description>
		<guid isPermaLink="false">comment:www.metafilter.com,2003:site.29202-575057</guid>
		<pubDate>Mon, 27 Oct 2003 14:54:34 -0800</pubDate>
		<dc:creator>the fire you left me</dc:creator>
	</item>	<item>
		<title>By: monju_bosatsu</title>
		<link>http://www.metafilter.com/29202/Iraqfilter#575058</link>	
		<description>I think somebody screwed up the perl script for generating the robots.txt.  Look at the directories; most don&apos;t exist at all.

Disallow:	/firstlady/photoessay/bookfestival/iraq
Disallow:	/firstlady/photoessay/welcometowh/iraq
Disallow:	/firstlady/recipes/iraq
Disallow:	/history/africanamerican/iraq
Disallow:	/history/photoessays/easter/one/iraq
Disallow:	/history/valentines/iraqDisallow:	
etc.</description>
		<guid isPermaLink="false">comment:www.metafilter.com,2003:site.29202-575058</guid>
		<pubDate>Mon, 27 Oct 2003 14:55:01 -0800</pubDate>
		<dc:creator>monju_bosatsu</dc:creator>
	</item>	<item>
		<title>By: specialk420</title>
		<link>http://www.metafilter.com/29202/Iraqfilter#575060</link>	
		<description>speaking of iraq: &lt;a href=&quot;http://riverbendblog.blogspot.com/&quot;&gt;the good&lt;/a&gt;. the &lt;a href=&quot;http://riversbendblog.blogspot.com/&quot;&gt;bad and ugly&lt;/a&gt;.</description>
		<guid isPermaLink="false">comment:www.metafilter.com,2003:site.29202-575060</guid>
		<pubDate>Mon, 27 Oct 2003 14:58:12 -0800</pubDate>
		<dc:creator>specialk420</dc:creator>
	</item>	<item>
		<title>By: the fire you left me</title>
		<link>http://www.metafilter.com/29202/Iraqfilter#575062</link>	
		<description>Then again, there is a &lt;a href=&quot;http://www.differentstrings.info/archives/002813.html&quot;&gt;history of revisionism&lt;/a&gt; within this White House.</description>
		<guid isPermaLink="false">comment:www.metafilter.com,2003:site.29202-575062</guid>
		<pubDate>Mon, 27 Oct 2003 14:59:51 -0800</pubDate>
		<dc:creator>the fire you left me</dc:creator>
	</item>	<item>
		<title>By: internal</title>
		<link>http://www.metafilter.com/29202/Iraqfilter#575064</link>	
		<description>looks like they disallowed almost the whole site to my uneducated eye...</description>
		<guid isPermaLink="false">comment:www.metafilter.com,2003:site.29202-575064</guid>
		<pubDate>Mon, 27 Oct 2003 15:03:47 -0800</pubDate>
		<dc:creator>internal</dc:creator>
	</item>	<item>
		<title>By: rusty</title>
		<link>http://www.metafilter.com/29202/Iraqfilter#575065</link>	
		<description>Let&apos;s all repeat the Slashdot discussion! First you say &quot;This is scary! I hope someone is archiving this!&quot; Then I say &quot;Look, man, /firstlady/photos/2003/01/iraq? /president/holiday/decorations/iraq? They never existed. Some server monkey screwed up a perl -e.&quot; Then you ignore that because it&apos;s more fun to make Ministry of Truth jokes.</description>
		<guid isPermaLink="false">comment:www.metafilter.com,2003:site.29202-575065</guid>
		<pubDate>Mon, 27 Oct 2003 15:12:19 -0800</pubDate>
		<dc:creator>rusty</dc:creator>
	</item>	<item>
		<title>By: whatnotever</title>
		<link>http://www.metafilter.com/29202/Iraqfilter#575066</link>	
		<description>It&apos;s just &quot;/text&quot; and &quot;/iraq&quot; added to every directory.  *Looks* like a goof to me.

They have valid reasons to want to block indexing of &quot;/text&quot; and &quot;/iraq&quot; (as root-level directories).

/text simply mirrors the content of the rest of the site, so there&apos;s no need to index it.  (Avoiding indexing redundant content is a good thing.)

/iraq simply redirects to /infocus/iraq, which is also linked from elsewhere.  If they hadn&apos;t goofed, /infocus/iraq would be indexed normally, with no need for any crawlers to go in /iraq.

There is the question of how &quot;/afac/index.htm&quot; got into the blunder.  It doesn&apos;t seem to exist (.html does), but maybe there&apos;s some big conspiracy to keep people from finding out about the money we&apos;re giving to Afghan children!

Now, you can argue all you want over whether or not the &quot;goof&quot; is coverup to block indexing of /infocus/iraq (I dare you to find any other directories ending in &quot;/iraq&quot;) or not, but it looks like a dumb blunder.  So just laugh at the dumb-dumbs or something.</description>
		<guid isPermaLink="false">comment:www.metafilter.com,2003:site.29202-575066</guid>
		<pubDate>Mon, 27 Oct 2003 15:13:18 -0800</pubDate>
		<dc:creator>whatnotever</dc:creator>
	</item>	<item>
		<title>By: eddydamascene</title>
		<link>http://www.metafilter.com/29202/Iraqfilter#575067</link>	
		<description>&lt;i&gt;Disallow: /firstlady/recipes/iraq&lt;/i&gt;

&quot;We know in what areas they eat Lis-san El Qua-Thia. They&apos;re in the area around Tikrit and Baghdad and east, west, south and north somewhat.&quot;</description>
		<guid isPermaLink="false">comment:www.metafilter.com,2003:site.29202-575067</guid>
		<pubDate>Mon, 27 Oct 2003 15:13:47 -0800</pubDate>
		<dc:creator>eddydamascene</dc:creator>
	</item>	<item>
		<title>By: rusty</title>
		<link>http://www.metafilter.com/29202/Iraqfilter#575069</link>	
		<description>whanotever: How can you say that! After the well-known and respected journalist &lt;a href=&quot;http://weblog.siliconvalley.com/column/dangillmor/archives/001450.shtml&quot;&gt;Dan Gillmor&lt;/a&gt; has urged us all to download those restricted nonexistent directories every day?

&quot;In the blogosphere, my readers can fact-check my ass, cause God knows I&apos;m not going to do it myself.&quot;</description>
		<guid isPermaLink="false">comment:www.metafilter.com,2003:site.29202-575069</guid>
		<pubDate>Mon, 27 Oct 2003 15:20:51 -0800</pubDate>
		<dc:creator>rusty</dc:creator>
	</item>	<item>
		<title>By: tuxster</title>
		<link>http://www.metafilter.com/29202/Iraqfilter#575078</link>	
		<description>&lt;em&gt;
Now, you can argue all you want over whether or not the &quot;goof&quot; is coverup to block indexing of /infocus/iraq (I dare you to find any other directories ending in &quot;/iraq&quot;) or not, but it looks like a dumb blunder. So just laugh at the dumb-dumbs or something.
&lt;/em&gt;

Sure, but for someone to commit a blunder in robots.txt file with regards to the word &quot;iraq&quot;,  they must have been trying to do something involving the word &quot;iraq&quot; and the robots.txt file to begin with. So the question is, what were they &lt;em&gt;really&lt;/em&gt; trying to do when they actually messed up?</description>
		<guid isPermaLink="false">comment:www.metafilter.com,2003:site.29202-575078</guid>
		<pubDate>Mon, 27 Oct 2003 15:48:11 -0800</pubDate>
		<dc:creator>tuxster</dc:creator>
	</item>	<item>
		<title>By: kayjay</title>
		<link>http://www.metafilter.com/29202/Iraqfilter#575092</link>	
		<description>&lt;i&gt;speaking of iraq: the good. the bad and ugly.
posted by specialk420 at 4:58 PM CST on October 27 &lt;/i&gt;

specialk420-

What is that? The same person with two blogs, or one original and a faker? I read the &lt;a href=&quot;http://riverbendblog.blogspot.com/&quot;&gt;first one&lt;/a&gt; on a regular basis, but I&apos;d never seen the &lt;a href=&quot;http://riversbendblog.blogspot.com/&quot;&gt;second&lt;/a&gt; until today.</description>
		<guid isPermaLink="false">comment:www.metafilter.com,2003:site.29202-575092</guid>
		<pubDate>Mon, 27 Oct 2003 16:13:16 -0800</pubDate>
		<dc:creator>kayjay</dc:creator>
	</item>	<item>
		<title>By: ROU_Xenophobe</title>
		<link>http://www.metafilter.com/29202/Iraqfilter#575094</link>	
		<description>tuxster:

The rest of the post you quoted mentioned reasons why you wouldn&apos;t want /text/ and /iraq/ to be spidered.

This is what happens when you let Dubya exercise his MaD 1eet sKillZ at perl, I guess?</description>
		<guid isPermaLink="false">comment:www.metafilter.com,2003:site.29202-575094</guid>
		<pubDate>Mon, 27 Oct 2003 16:15:11 -0800</pubDate>
		<dc:creator>ROU_Xenophobe</dc:creator>
	</item>	<item>
		<title>By: specialk420</title>
		<link>http://www.metafilter.com/29202/Iraqfilter#575100</link>	
		<description>i think the links to the israeli backed: &lt;a href=&quot;http://www.memri.org/&quot;&gt;memri.org &lt;/a&gt;  and &lt;a href=&quot;http://www.powerlineblog.com/&quot;&gt;this blog &lt;/a&gt;which was linked on the suspicious copy until just a few hours ago might  be a few clues as to who is behind the &lt;a href=&quot;http://riverbendblog.blogspot.com/&quot;&gt;baghdad burning&lt;/a&gt; rip-off.</description>
		<guid isPermaLink="false">comment:www.metafilter.com,2003:site.29202-575100</guid>
		<pubDate>Mon, 27 Oct 2003 16:25:34 -0800</pubDate>
		<dc:creator>specialk420</dc:creator>
	</item>	<item>
		<title>By: adamrice</title>
		<link>http://www.metafilter.com/29202/Iraqfilter#575105</link>	
		<description>I&apos;m always happy to blame GW for, well, just about anything. This may be evidence of the generally secretive approach at GW&apos;s WH, but when directories like
http://www.whitehouse.gov/president/holiday/whtree/text/
are banned (that&apos;s legit, btw), you know they&apos;re just banning everything. Actually, that&apos;d be an interesting project--write a spider that doesn&apos;t respect robots.txt, and find out which, if any, directories have been left open.</description>
		<guid isPermaLink="false">comment:www.metafilter.com,2003:site.29202-575105</guid>
		<pubDate>Mon, 27 Oct 2003 16:36:46 -0800</pubDate>
		<dc:creator>adamrice</dc:creator>
	</item>	<item>
		<title>By: adamrice</title>
		<link>http://www.metafilter.com/29202/Iraqfilter#575118</link>	
		<description>Damn, specialk420, &lt;a href=&quot;http://www.metafilter.com/mefi/29202#575060&quot;&gt;those links&lt;/a&gt; are fascinating!</description>
		<guid isPermaLink="false">comment:www.metafilter.com,2003:site.29202-575118</guid>
		<pubDate>Mon, 27 Oct 2003 17:06:38 -0800</pubDate>
		<dc:creator>adamrice</dc:creator>
	</item>	<item>
		<title>By: specialk420</title>
		<link>http://www.metafilter.com/29202/Iraqfilter#575120</link>	
		<description>total derail. the &lt;a href=&quot;http://atrios.blogspot.com/&quot;&gt;rest of the discussion&lt;/a&gt; is over at &lt;a href=&quot;http://atrios.blogspot.com/&quot;&gt;atrios&lt;/a&gt;. the bush adminstrations efforts to control information in this case - and dilute information in the baghdad burning case (we suspect) is  ... well... i think people can make up their own minds.</description>
		<guid isPermaLink="false">comment:www.metafilter.com,2003:site.29202-575120</guid>
		<pubDate>Mon, 27 Oct 2003 17:11:16 -0800</pubDate>
		<dc:creator>specialk420</dc:creator>
	</item>	<item>
		<title>By: specialk420</title>
		<link>http://www.metafilter.com/29202/Iraqfilter#575124</link>	
		<description>&lt;small&gt;(further derail) this cat seems to be &lt;a href=&quot;http://www.suzerainty.blogspot.com&quot;&gt;on the hunt.&lt;/a&gt;&lt;/small&gt;</description>
		<guid isPermaLink="false">comment:www.metafilter.com,2003:site.29202-575124</guid>
		<pubDate>Mon, 27 Oct 2003 17:16:05 -0800</pubDate>
		<dc:creator>specialk420</dc:creator>
	</item>	<item>
		<title>By: mecran01</title>
		<link>http://www.metafilter.com/29202/Iraqfilter#575138</link>	
		<description>webreaper will ignore spiders, I believe. If not, the htweb might.  One of them does.  And Wget might be configurable in that regard.</description>
		<guid isPermaLink="false">comment:www.metafilter.com,2003:site.29202-575138</guid>
		<pubDate>Mon, 27 Oct 2003 18:07:42 -0800</pubDate>
		<dc:creator>mecran01</dc:creator>
	</item>	<item>
		<title>By: palancik</title>
		<link>http://www.metafilter.com/29202/Iraqfilter#575146</link>	
		<description>Something interesting about this strategy, if it deserves that label. Go to &lt;a href=&quot;http://www.google.com/search?hl=en&amp;ie=ISO-8859-1&amp;q=whitehouse.gov+%2B+iraq&amp;btnG=Google+Search&quot;&gt;Google, and search whitehouse.gov + Iraq&lt;/a&gt;, and you get two White House entries plus all the sites they would probably prefer you not see: &lt;a href=&quot;http://www.stumbleupon.com/url/www.whitehouse.gov/infocus/iraq/reasons.html&quot;&gt;Stumbleupon,&lt;/a&gt;,   &lt;a href=&quot;http://www.whitehouse.org/ &quot;&gt;Whitehouse.org&lt;/a&gt;, the  &lt;a href=&quot;http://www.waronevil.com/r/in.whi.wh.php&quot;&gt;Evil Eradication Office&lt;/a&gt;, and a lot of weblogs (e.g., &lt;a href=&quot;http://www.discourse.net/archives/2003/10/whitehousegov_seeks_to_put_iraq_statements_down_the_memory_hole.html&quot;&gt; such as this one&lt;/a&gt;. 

So why is this such a good idea for the White House? Don&apos;t they know that someone will notice, and publicize it?</description>
		<guid isPermaLink="false">comment:www.metafilter.com,2003:site.29202-575146</guid>
		<pubDate>Mon, 27 Oct 2003 18:31:00 -0800</pubDate>
		<dc:creator>palancik</dc:creator>
	</item>	<item>
		<title>By: whatnotever</title>
		<link>http://www.metafilter.com/29202/Iraqfilter#575151</link>	
		<description>adamrice, I wasn&apos;t aware that &quot;/text&quot; showed up everywhere.  But that just makes even more sense, for the most part.  Throw &quot;/text&quot; on the end of any directory and you get the plain-text version (for accessibility, generally).  That&apos;s a fine way to do things.  But there&apos;s no need for anyone to index it, because it has the exact same content as the non-/text page. 

Take any &quot;/text&quot; url, remove &quot;/text&quot;, and you should get the same page, but with pretty pictures.  People can feel free to check all of them every day for themselves, but it seems a bit wasteful.

So blocking [everything]/text makes perfect sense.  How [everything]/iraq got in there...?  I&apos;ll still chalk it up to &lt;a href=&quot;http://www.wordspy.com/words/HanlonsRazor.asp&quot;&gt;stupidity&lt;/a&gt;.

Really, does anyone remember &lt;a href=&quot;http://www.metafilter.com/mefi/5390#43248&quot;&gt;&quot;insert something meaningful here&quot;&lt;/a&gt;?</description>
		<guid isPermaLink="false">comment:www.metafilter.com,2003:site.29202-575151</guid>
		<pubDate>Mon, 27 Oct 2003 18:37:59 -0800</pubDate>
		<dc:creator>whatnotever</dc:creator>
	</item>	<item>
		<title>By: UKnowForKids</title>
		<link>http://www.metafilter.com/29202/Iraqfilter#575169</link>	
		<description>palancik:  Your reported Google results aren&apos;t due to the robots.txt file - you&apos;re only getting two whitehouse.gov results because Google hides all the others.  &lt;a href=&quot;http://www.google.com/search?hl=en&amp;lr=&amp;ie=ISO-8859-1&amp;c2coff=1&amp;q=+site%3Awww.whitehouse.gov+iraq&amp;btnG=Google+Search&quot;&gt;Searching whitehouse.gov for &quot;iraq&quot;&lt;/a&gt; gives a ton of hits (at least for now).</description>
		<guid isPermaLink="false">comment:www.metafilter.com,2003:site.29202-575169</guid>
		<pubDate>Mon, 27 Oct 2003 19:01:46 -0800</pubDate>
		<dc:creator>UKnowForKids</dc:creator>
	</item>	<item>
		<title>By: inpHilltr8r</title>
		<link>http://www.metafilter.com/29202/Iraqfilter#575183</link>	
		<description>&lt;em&gt; Let&apos;s all repeat the Slashdot discussion!&lt;/em&gt;

Don&apos;t you have a site to maintain somewhere?</description>
		<guid isPermaLink="false">comment:www.metafilter.com,2003:site.29202-575183</guid>
		<pubDate>Mon, 27 Oct 2003 19:26:33 -0800</pubDate>
		<dc:creator>inpHilltr8r</dc:creator>
	</item>	<item>
		<title>By: soyjoy</title>
		<link>http://www.metafilter.com/29202/Iraqfilter#575256</link>	
		<description>Yeah, this same discussion has been had at Atrios and other places all over the Net already. BUT it was worth posting just in order to call the link &lt;a href=&quot;http://www.metafilter.com/mefi/29202&quot;&gt;&quot;Iraqfilter.&quot;&lt;/a&gt; How many chances are we gonna have to make that pun? Kudos, condour75.</description>
		<guid isPermaLink="false">comment:www.metafilter.com,2003:site.29202-575256</guid>
		<pubDate>Mon, 27 Oct 2003 21:09:10 -0800</pubDate>
		<dc:creator>soyjoy</dc:creator>
	</item>	<item>
		<title>By: rusty</title>
		<link>http://www.metafilter.com/29202/Iraqfilter#575289</link>	
		<description>inpHilltr8r: I read MetaFilter to forget. :-)</description>
		<guid isPermaLink="false">comment:www.metafilter.com,2003:site.29202-575289</guid>
		<pubDate>Mon, 27 Oct 2003 23:38:47 -0800</pubDate>
		<dc:creator>rusty</dc:creator>
	</item>	<item>
		<title>By: bwerdmuller</title>
		<link>http://www.metafilter.com/29202/Iraqfilter#575309</link>	
		<description>Wget&apos;s been mentioned above, but it might be a worthwhile endeavour to configure something like it to download exactly those pages. Might be interesting to read around, say, the next election.</description>
		<guid isPermaLink="false">comment:www.metafilter.com,2003:site.29202-575309</guid>
		<pubDate>Tue, 28 Oct 2003 02:35:10 -0800</pubDate>
		<dc:creator>bwerdmuller</dc:creator>
	</item>	<item>
		<title>By: Mitheral</title>
		<link>http://www.metafilter.com/29202/Iraqfilter#575356</link>	
		<description>HTTrack can be configured to ignore robots.txt files and it&apos;s got a nice clicky interface for those on windows who can&apos;t handle a command line.  You could even tell it to only get things mentioned in robots.txt (or any text file of links for that matter)</description>
		<guid isPermaLink="false">comment:www.metafilter.com,2003:site.29202-575356</guid>
		<pubDate>Tue, 28 Oct 2003 07:48:32 -0800</pubDate>
		<dc:creator>Mitheral</dc:creator>
	</item>
	</channel>
</rss>
