Join 3,522 readers in helping fund MetaFilter (Hide)


The $780,100 Homepage
March 27, 2014 5:11 AM   Subscribe

Eight years after the Million Dollar Homepage (previously) sold out its pixels and funded Alex Tew's college education, 22% of the page has fallen victim to link rot. Article inspired by musings from our own Fearless Leader.
posted by Horace Rumpole (22 comments total) 12 users marked this as a favorite

 
That's interesting. I would have expected the link rot to be way above 22%, based on an unscientific study of my own bookmarks.
posted by chavenet at 5:16 AM on March 27 [8 favorites]


Golden Palace Casino...now there's a name I haven't heard in a long, long time. I wonder whatever happened to the enormous collection of crapola they used to snatch up at auctions.
posted by jquinby at 5:21 AM on March 27


Interesting that more than a few of the rotted links are to web hosting sites.
posted by ardgedee at 5:30 AM on March 27


The article says that 22% of the Million Dollar Homepage’s pixels now fail to load a webpage when clicked, which I interpret to mean that they're only checking for a response from the link, not that there's an actual site behind it. I bet a large portion of the 78% of the 'active' sites are actually just parked domains at this point.
posted by Ickster at 5:43 AM on March 27 [3 favorites]


I'm kind of glad I got to see this site, anyway.
posted by SharkParty at 5:47 AM on March 27 [11 favorites]


Surprisingly easy to find Waldo
posted by maryr at 6:10 AM on March 27 [2 favorites]


Just looking at the ads on the site really conjures up a sense of my early internet days. Wow.
posted by So You're Saying These Are Pants? at 6:57 AM on March 27 [1 favorite]


Seems like way more than 22% to me. I've cliced randomly at least 25 times on the map just now, & only hit two functional web pages. 2 or 3 get no response, but the rest are either generic parking pages, suspended/closed account notices, under construction or not found.
posted by Devils Rancher at 7:30 AM on March 27 [1 favorite]


"Thanks to Matt Haughey for inspiring this story."

ah
posted by subversiveasset at 7:32 AM on March 27


So looking into the story of what happened to Alex Tew, it turns out he dropped out of school after one semester. I don't know what I'd do if I stumbled onto a million dollars, but it seems like a shame he didn't follow through on his plan.
posted by ianhattwick at 7:40 AM on March 27


It's interesting that the creator of one of the most frantic, overloaded sites on the Web also went on to create calm.com.
posted by drlith at 7:53 AM on March 27


Man I love the Million Dollar Homepage and am so glad it stays online. This analysis is great, particularly the visualizations.

I believe there have been some more academic studies of link rot, but I don't have references at my fingertips. The million dollar homepage links are of course heavily biased to commercial interests. A lot of these domains have been repurposed by squatters; the giant "Rent Pixel Ads" shows as live in the analysis for instance, but the rentpixelads.com site just redirects to cheaplager.com. Is that link rot? Ordinarily yes, but for this brazenly commercial link site maybe it's not.

I have a linkblog going back about 10 years. Pinboard tells me I have about 1300 dead links out of 11,000 but that statistic isn't very useful for a number of reasons. Maybe I should survey them now. It's another biased sample, but one that's personally interesting to me.
posted by Nelson at 8:32 AM on March 27 [1 favorite]


"I bet a large portion of the 78% of the 'active' sites are actually just parked domains at this point."

I suspect you're very correct. In fact, I'd be shocked if more than 15% of the pixels were still valid sites. If I had time, I'd set up a crowd-sourced dead link/parked domain detector. Or... we could assign 100x100 blocks to 100 Mefites and get an actual number.
posted by spiderskull at 8:45 AM on March 27 [1 favorite]


The links should all be Waybacked.
posted by stbalbach at 8:59 AM on March 27


Yeah, I tried to get waxpancake to do the analysis but he said it would take too long because you couldn't just do a simple command line web fetch of every URL. He talked about maybe doing screenshots automated of every link, then sending those screenshots to Mechanical Turk to ask people if they were parked domains, or yeah, you could compare them pixel-for-pixel with the wayback machine results for 2005 of those URLs, but it would be a monster sized project to get right.

In my non-scientific clicking around, it felt like only a quarter of the clicks lead to the original sites, about half were parked domains and a quarter were dead. From what it sounds like the writer just fetched the HTML and counted 404s or servers not found as 22%.
posted by mathowie at 9:14 AM on March 27


I'm crawling the list of URLs from here now, actually. Nothing elaborate - scraped them down with Lynx and then am passing them to a lwp-request one-liner to capture the headers (lwp-request -Sed -m GET $i, for those of you who are interested)

It's not the whole page, but the rough glance of the headers (as the results totter by) is interesting. 302s, 403s, the flat dead ones, meta-search keywords, etc.

I can post the list of URLs (and resulting dump of my results) someplace when it's done.
posted by jquinby at 9:31 AM on March 27


...grabbing the whole page, btw, would have been a mod of the lwp-request. I figured the headers would be enough for a first pass.
posted by jquinby at 9:33 AM on March 27


Man, that page is like a still shot from some kind of dystopian nightmare with autonomous AI billboardbots that actively seek out humans.
posted by Flunkie at 9:35 AM on March 27 [1 favorite]


Note to others attempting this: make sure, ah, that you're not still on your company's VPN connection when trying this out.
posted by jquinby at 9:47 AM on March 27 [2 favorites]


I was surprised to see that the Million Dollar Homepage launched in 2005. For some reason, it -- the design, the concept, the initial buzz it created -- seems more like a relic of a late-90s World Wide Web. I expect it to be linked in a web ring along with dancing hamsters and Mr. "I Kiss You" Mahir.
posted by BurntHombre at 10:43 AM on March 27 [9 favorites]


So the URL list is here on pastebin. The results page is ~3MB and too large to paste. If someone wants it for number-crunching, let me know and I can mail it to you (or stick it on a sharefile site or somesuch).
posted by jquinby at 11:34 AM on March 27 [1 favorite]


I guess we've entered ghost territory. Herewith a few odds and ends I've pulled out of the aforementioned results page, saved here for posterity:

Last-modified years (not all servers send these):

2014: 497
2013: 114
2012: 70
2011: 37
2010: 31
2009: 31
2006: 20
2008: 17
2007: 16
2005: 9
2003: 3
2004: 1
1970: 1

Web server landscape:

1214 Apache
358 Microsoft-IIS/7.5
219 Microsoft-IIS/6.0
163 nginx
150 Apache/2.2.22
82 Apache/2.2.3
80 Apache/2.2.3
75 nginx/1.4.7
55 Microsoft-IIS/8.0
54 Microsoft-IIS/7.0
53 Apache/2
41 Apache/2.4.9
38 Oversee
34 LiteSpeed
33 Apache-Coyote/1.1
28 Apache/2.2.15
23 Apache/2.2.25
23 Apache/2.2.22
22 GSE
18 cloudflare-nginx
17 nginx/1.4.4
17 Apache/2.2.16
17 Apache/2.2.14
15 YTS/1.20.28
15 YTS/1.19.11

Top 25 HTTP server response codes:

2312 200 OK
482 301 Moved Permanently
357 302 Found
129 404 Not Found
81 403 Forbidden
66 406 Not Acceptable
63 500 Server closed connection without sending any data back
61 302 Moved Temporarily
40 200 (OK)
38 500 Can't verify SSL peers without knowing which Certificate Authorities to trust
30 302 Object moved
15 404 Not Found
12 500 Internal Server Error
7 400 Bad Request
5 503 Service Unavailable
3 503 Service Temporarily Unavailable
3 410 Gone
3 302 Redirect
3 301 Found

...and the bottom 10:

1 404 File not found
1 403 (Forbidden)
1 403 Bad Behavior
1 401 Authorization Required
1 307 Temporary Redirect
1 303 See other
1 302 Use
1 302 OK
1 302 moved
1 301 Removed Permanently

("Bad behavior?")

Using this website, I generated a wordcloud of the Meta-keywords. This was a little tricky, since a goodly number of them were actually phrases, like "How To Take Care Of Your Hamster." You would think that 'penis' would show up more often, but no, I only count a handful of references.

Results which included obvious domain-squatting messages in the keywords (ie: Foo.com is your first and best source of information on foo.com. Here you will also find topics of general interest, &c): 208

URLs that were paid for and reserved, but apparently never used (http://Paid&Reserved-JoeBlow/): 17

Requests that failed because of bad (non-existent?) hostnames: 282

Some examples of the bad/nonexistent hostnames: science.shumans.com, support.xilo.net, thriftyusa.com, tonysstuff.com, usnavyseals.tv, webpersonalshopper.biz, work-at-home-now.us, www.100w.com.tw, www.1-love-rings.com, www.2millionpixels.com, www.50centspixels.com, www.5starmags.com, www.aaa-wholesale.com, www.accommodation-auckland.co.nz, www.affiliatehighway.co.uk, www.agitor.com, and www.all-croatian-hotels.com.

Anyway, there. Enjoy!
posted by jquinby at 1:26 PM on April 4 [1 favorite]


« Older Camera Used by Astronauts on Moon "Pulls $940 Gs" ...  |  "I’m just trying to color a sk... Newer »


This thread has been archived and is closed to new comments