Skip

Million Short: A different way to search the web
September 20, 2012 12:07 PM   Subscribe

"Million Short is an experimental web search engine (really, more of a discovery engine) that allows you to REMOVE the top million (or top 100k, 10k, 1k, 100) sites from the results set. We thought it might be somewhat interesting to see what we'd find if we just removed an entire slice of the web." Developer Sanjay Arora, founder of Exponential Labs, explains the thinking behind his development of Million Short and its inverse, Million Tall, "which ONLY indexes the top million (or top 100k, etc.) sites."

Arora says that his intention was not to replace Google or imply that removing the top million sites would necessarily result in better information, but rather to provide a "discovery engine" that can turn up obscure but potentially interesting sites. His intention with Million Tall, on the other hand, was to pose the question, "Imagine a search engine that only indexed the top 1 million sites on the web. Would you even notice?"

[NOTE: Million Short only removes the top sites, NOT the top search results. Therefore, the first Google result may be the same as the first Million Short result, if the site isn't in the top million (or 100k, etc.) sites overall.]
posted by hurdy gurdy girl (35 comments total) 55 users marked this as a favorite

 
Metafilter: a small Oregon-based company operating a collection of sites built with user interaction in mind.
posted by yoink at 12:12 PM on September 20, 2012 [2 favorites]


First reaction: This is going to be awesome.
Reaction after my standard first-search-on-a-new-engine: All it removed was Wikipedia?
Reaction after reading the sidebar on what was actualy removed: This is awesome!

See you in hell, cnet, ehow, scribd and yahoo!!
posted by DU at 12:12 PM on September 20, 2012 [4 favorites]


Oh wow a new way to obsessively check for reviews of my work.
posted by The Whelk at 12:15 PM on September 20, 2012 [5 favorites]


See you in hell, cnet, ehow, scribd and yahoo!!

This. Content scrapers are the scourge of your average Google search. I'm not sure why Google isn't better at filtering them out.
posted by odinsdream at 12:15 PM on September 20, 2012 [6 favorites]


Someone tell me this isn't going to direct me straight into a trap-site when I search for something, esoteric or not.
posted by RolandOfEld at 12:17 PM on September 20, 2012


The highest level that INCLUDES MetaFilter is 1,000. Which is a reasonable place to start.
posted by oneswellfoop at 12:20 PM on September 20, 2012


Reminds me of an interesting feature on the beloved but defunct webmag Stylus, where they polled their staff about their favourite albums, discarded the 100 favorites, and published entries #101-200 as a list. Probably the most interesting list of good albums that I've ever come across.
posted by blue t-shirt at 12:23 PM on September 20, 2012 [12 favorites]


I'm not sure I'm buying it, unless I misunderstand the methodology. I searched for "retail" with the top million removed, and it brought me some VERY prominent retail sites (including retail.com). What am I not understanding here?
posted by jbickers at 12:23 PM on September 20, 2012


Previously.
posted by marienbad at 12:30 PM on September 20, 2012


Ack! Marienbad, thanks for being better at making sure it wasn't a double than I was. I promise, I did check a few different ways, but, yeah, this is a double. Sorry mods.
posted by hurdy gurdy girl at 12:33 PM on September 20, 2012


Thank you for this.
posted by stoneweaver at 12:37 PM on September 20, 2012


I promise, I did check a few different ways,

Shouldn't have excluded the top 1000 hits from your search ;)
posted by yoink at 12:42 PM on September 20, 2012 [6 favorites]


I'd love a search engine that removes all store/catalog content.
posted by bottlebrushtree at 12:54 PM on September 20, 2012 [1 favorite]


I'm not sure I'm buying it, unless I misunderstand the methodology. I searched for "retail" with the top million removed, and it brought me some VERY prominent retail sites (including retail.com). What am I not understanding here?
It removes "sites" rather than "results". So a well-known website for a specific product might dominate results in related searches, but not be that high in traffic overall. Retail.com has an Alexa ranking of 11 million+
posted by Jehan at 1:07 PM on September 20, 2012


hehe! removes metafilter from a search for metafilter

The price of Infamy
posted by jannw at 1:11 PM on September 20, 2012


This is not really that hard to do on your own. Go to Google and enter a search term. When the results load up, note the URL, and at the very end of that string, add the following parameter:

&start=100

Congratulations, you just skipped the first 100 top-ranked results of your query. :^)
posted by surazal at 1:37 PM on September 20, 2012 [3 favorites]


Wow, I can't believe I somehow managed to screw up an a href tag to Google, of all places. o_O
posted by surazal at 1:39 PM on September 20, 2012


That Stylus list is interesting... but the review of Fun House (#101) is just about the exact opposite of what everyone I know thinks of the album.

Forgotten between the shock and hipster cache of the John Cale-produced self-titled debut and the lionized classic Raw Power, even a ten disc box set detailing every single track recorded during its making couldn't raise its status in the minds of Iggy fans as merely the lesser of these three pillars of punk rock.
posted by Huck500 at 1:43 PM on September 20, 2012


surazal, but that's not what this tool does - it doesn't skip the top 100 results, it removes the top million sites from the search.
posted by en forme de poire at 1:44 PM on September 20, 2012 [2 favorites]


Million Short is certainly a good way to get rid of horrible SEO crap. Google has been fighting a losing battle with that crowd, to the detriment of its search quality.
posted by chimaera at 1:55 PM on September 20, 2012 [1 favorite]


Yeah, Huck, there's a little bit of rationalization going on in a few of those blurbs. Not a huge fan of the writing, but I dig the list of albums itself.
posted by blue t-shirt at 1:58 PM on September 20, 2012


If you use Chrome, you will see options to "block" sites from your search results. This is also a good way to get rid of content aggregators and such.
posted by condour75 at 2:20 PM on September 20, 2012


The highest level that INCLUDES MetaFilter is 1,000. Which is a reasonable place to start.

Well, using y2karl for the search term comes up with The Gospel of YouTube according to y2karl | MetaFilter while Million Short starts with O Black and Unknown Bards – Among Other Things, Regarding The White Invention of The Blues, MetaFilter, a blog cut and paste of a MetaFilter post which reads as if it were translated out of and then back into English.

Why the Gospel of YouTube starts the Google list is remarkable to me as every link is dead -- but then I do list the titles of a multitude known music videos for which there a multitude of active niche audiences. As for million short's choice for #1, not knows Y2yodakarl. Except for the fact that that one is my most favorited post -- the why of which is another mystery of its own but see also y2Yoda.
posted by y2karl at 2:21 PM on September 20, 2012


Another million short result was Meta Meta Rosetta Banana Fana Fo Feta Fi Fo Mo Meta Meta ...
Well, I never, the things I used to say & etc.
posted by y2karl at 2:49 PM on September 20, 2012


I searched for my Flickr user name and found a several small blogs which has used one of my photos and given me credit. That was satisfying.
posted by zzazazz at 3:21 PM on September 20, 2012 [1 favorite]


So there are webpages that aren't trying to sell you something.
posted by The Card Cheat at 3:26 PM on September 20, 2012


My current search problem is getting a zillion Youtube results. No, I do not want to sit through a three-minute video on doing something when all I want is a single sentence of explanation. But Youtube is owned by Google, so ...
posted by Joe in Australia at 4:21 PM on September 20, 2012


Man, Real Ultimate Power isn't as powerful as it used to be. It's the number one result for "ninja" if you remove the top 100K.
posted by asnider at 4:52 PM on September 20, 2012


condour75: "If you use Chrome, you will see options to "block" sites from your search results. This is also a good way to get rid of content aggregators and such."

You can do this in Firefox with the Google Hit Hider by Domain userscript.
posted by Chrysostom at 8:38 PM on September 20, 2012


Million Short is certainly a good way to get rid of horrible SEO crap. Google has been fighting a losing battle with that crowd, to the detriment of its search quality.

God, THIS. I think it could be solvable by just crowd-sourcing the hunt for SEO sites to exclude.
posted by JHarris at 9:50 PM on September 20, 2012


Another problem with search engines is only for pages that are active. We are now nearly 20 years into the web and at this point I suspect the majority of stuff published on the web is disappeared. Gone. There is one solution: Wayback Machine from Internet Archive, which has been archiving some significant percentage of the web for a long time. But the problem is you can't search it. Need to know the URL to access. So Google can search across space. Wayback can search across time. But no search engine is able to search time+space. It's like the world is missing something really important and someone is going to make a boatload when they figure out how to address this. It's not very complicated, just archive everything and allow full text search. It's already being done with TV News (see my previous).
posted by stbalbach at 10:02 PM on September 20, 2012 [4 favorites]


I am so excited right now.
posted by whimsicalnymph at 7:22 AM on September 21, 2012


[NOTE: Million Short only removes the top sites, NOT the top search results. Therefore, the first Google result may be the same as the first Million Short result, if the site isn't in the top million (or 100k, etc.) sites overall.]

How the heck does that work? Is there an example where that's true?
posted by talldean at 4:12 PM on September 25, 2012


[This particular search was used as an example in the comments section of one of the above articles, I think:]

You'll see what I mean if you try entering "Dolly Parton" in MillionShort and in Google (a quick way to do this is through the "MillionShort vs. Google" page). The top site is Dolly Parton's fan site, dollyparton.com; it isn't excluded from the MillionShort results because it isn't in the top million websites.
posted by hurdy gurdy girl at 10:41 AM on September 26, 2012


What I find more interesting is that you can remove sites from your search via the settings page. Does google allow you to do this?
posted by nooneyouknow at 10:55 AM on September 26, 2012 [1 favorite]


« Older Capturing photons   |   See, they return Newer »


This thread has been archived and is closed to new comments



Post