Revealing the unseen
July 10, 2015 10:28 PM   Subscribe

In 1945, Vannevar Bush described a physical storage, search and retrieval system that worked like an early hypertext. He called it a memex. Earlier this year, DARPA released the open-source components for it's own project named Memex, a powerful engine for searching the deep, dark web.

MEMEX (Domain Specific Search) at DARPA's Open Catalogue
Memex seeks to develop software that advances online search capabilities far beyond the current state of the art. The goal is to invent better methods for interacting with and sharing information, so users can quickly and thoroughly organize and search subsets of information relevant to their individual interests. Creation of a new domain-specific indexing and search paradigm will provide mechanisms for improved content discovery, information extraction, information retrieval, user collaboration, and extension of current search capabilities to the deep web, the dark web, and nontraditional (e.g. multimedia) content.
New Search Enginer Exposes The Dark Web, 60 Minutes

Is DARPA's Memex search engine a Google-killer?
The web is getting deeper and darker and Google, Bing and Yahoo don't actually search most of it.

They don't search the sites on anonymous, encrypted networks like Tor and I2P (the so-called Dark Web) and they don't search the sites that have either asked to be ignored or that can't be found by following links from other websites (the vast, virtual wasteland known as the Deep Web).
Sleuthing Search Enginer; Even Better Than Google?
Unlike a Google search, Memex can search not only for text but also for images and latitude/longitude coordinates encoded in photos. It can decipher numbers that are part of an image, including handwritten numbers in a photo, a technique traffickers often use to mask their contact information. It also recognizes photo backgrounds independently of their subjects, so it can identify pictures of different women that share the same backdrop, such as a hotel room—a telltale sign of sex trafficking, experts say.

Also unlike Google, it can look into, and spot relationships among, not only run-of-the-mill Web pages but online databases such as those offered by government agencies and within online forums (the so-called deep Web) and networks like Tor, whose server addresses are obscured (the so-called dark Web).
The Defense Advanced Research Projects Agency has had its hand in a few other web projects. Clarification of Tor's involvement with DARPA's Memex

Human Traffickers Caught On Hidden Internet and Memex Data Maps
Investigators Use New Tool To Comb Deep Web For Human Traffickers - "NPR's Robert Siegel interviews Dan Kaufman of the Defense Advanced Research Projects Agency about a sophisticated Internet search engine developed to help police track down human traffickers."
posted by the man of twists and turns (19 comments total) 43 users marked this as a favorite
But is it going to leak Van Eck radiation?
posted by Slackermagee at 10:40 PM on July 10, 2015 [12 favorites]

sooo, as an SEO (ahem) professional is there a buck in this "dark web"?
posted by mattoxic at 11:42 PM on July 10, 2015 [1 favorite]

This where I hope out loud that cstross has a sockpuppet named Angleton who is going to show up in the thread.
posted by namewithoutwords at 12:24 AM on July 11, 2015 [18 favorites]

James, Jesus has left the building.

Something, something, mirrors.
posted by clavdivs at 12:41 AM on July 11, 2015 [1 favorite]

"The web is getting deeper and darker and Google, Bing and Yahoo don't actually search most of it."

The article makes it sound like this is a negative but I actually just want to go to wikipedia and not get TERRIFYING results from the pedo-web.
posted by Eyebrows McGee at 1:37 AM on July 11, 2015 [21 favorites]

As I understand it, the reason the "dark web" exists is because it is technically harder to get to. The second it is made easier to get to, all the sites that were using the "dark" to their advantage will up and leave to "darker" parts.

So I don't really see what memex is going to do except to shine light on those who are slow to move out.
posted by hal_c_on at 2:07 AM on July 11, 2015 [2 favorites]

dumb q - where is it? none of the links seem to go to the actual search engine.

hal_c - i think that's the "deep" web. "dark" means tor, where the advantage isn't that you're not visible, but that you're anonymous.
posted by andrewcooke at 5:22 AM on July 11, 2015 [1 favorite]

dumb q - where is it? none of the links seem to go to the actual search engine.

If I'm interpreting this correctly, this is being reported in an extremely misleading way (and I'm afraid this fpp has followed this reporting). MEMEX is not a single, coherent piece of software, but rather a DARPA research program that is funding grants across a range of institutions to tackle pieces of research that would feed into searching the "dark web". Here is the BAA for the program, where BAA is DARPA's term for grant solicitation tied to a particular program.

What exists of MEMEX currently is in the fpp's 6th link (this), which is basically a link to the githubs for the groups funded under this project. You can see that, in DARPA style, they are all tackling small pieces of the problem (this is TA1,TA2,TA3 in the BAA). Not only that, the grant program is a 3-year program and it seems to have only been funded in Aug 2014, so it isn't even through a full year. The first milestone is Aug 2015 so probably those githubs are full of grad students frantically iterating towards a usable demo for their bit of the project. Often DARPA BAAs do have a phase where someone integrates components across research groups, or they fund one organization to be working on that all along, but this wouldn't happen in any final way until later in the grant cycle, and I don't see evidence of it in this BAA (we'd need to see individual proposals). DARPA doesn't actually develop things like this themselves as part of a BAA-style grant program.

Not only that, DARPA doesn't fund research with the intent of it being any better than a demo of state-of-the-art technology on their auspices -- they do expect something working in the way that research software does, but they don't expect a product for end users (and the techniques involved in this one would require a _lot_ of resources to have a real google-style frontend). So I wouldn't ever expect to see a MEMEX search page on DARPA's web site as a result of this. Rather the research will trickle out in the usual way as these groups publish papers & release code, and grad students get hired elsewhere, etc.
posted by advil at 6:01 AM on July 11, 2015 [23 favorites]

Having watched the CBS report linked to in the story, I can see why DARPA hired the video game guy... he's good at making really impressive demos that sell a story.

His story about "cyber security" is to have a computer that can probe and control every node on the military's network, in order to be able to shut down compromised nodes automatically. This is such an awful idea I can't believe everyone is bullshitted by it.

Someone in charge really needs to break out the rainbow books, and learn how security is supposed to be done.
posted by MikeWarot at 6:16 AM on July 11, 2015 [2 favorites]

There's a lot of potential here, although I'm curious why pieces of this don't already exist. Couldn't a search engine already crawl Tor? Or is the sales pitch about how comprehensive it could be – being able to crawl every possible corner of the web? I see some of the libraries in the OpenCatalog link show some ways to navigate login pages.
posted by destro at 6:52 AM on July 11, 2015

Nowhere does it mention head-mounted walnuts. Disappointed.
posted by davemee at 7:59 AM on July 11, 2015

Is there anything deeper and/or darker to the deep and/or dark web than not being listed anywhere else in the DNS system? Given that in order to effectively communicate with the rest of the world, all these nasty and terrifying sites must still be on the same logical "wire" and respond to conventional protocols.

The only thing I can think of that makes them deep or dark is that you need to know they are there in order to get to them and there because they are not listed in a public directory. Even if their contents are unreadable without the benefit of the proper cryptography tools, the raw bits must still be available to anyone entering the right IP address and port number into their client of choice, right?
posted by hwestiii at 8:07 AM on July 11, 2015

The only thing I can think of that makes them deep or dark is that you need to know they are there in order to get to them and there because they are not listed in a public directory.

But that's enough. What's the easiest way to hide a book in a large library? Remove its entry from the catalog.
posted by metaquarry at 8:40 AM on July 11, 2015 [2 favorites]

But that's enough. What's the easiest way to hide a book in a large library? Remove its entry from the catalog.

You're absolutely right. The image I lean on is a misfiled document in large set of filing cabinets. I just ask because the mental images prompted by the terms "deep" and "dark" tend to be quite feverish and overwrought, whereas the underlying truth is far more prosaic. Consuming all the reporting that is done one the deep/dark web, you'd never guess that most of the stuff they refer to is really right under your nose.
posted by hwestiii at 8:50 AM on July 11, 2015 [3 favorites]

Vannevar Bush is an interesting figure. He was also instrumental in the history of MIT. (Actually, if anyone knows of a good history of MIT I'd love to read it.)
posted by grobstein at 9:16 AM on July 11, 2015 [2 favorites]

Wasn't VB associated with the Tuxedo Park crowd?
posted by clavdivs at 10:17 AM on July 11, 2015

Ever drive down the street looking at all the store fronts, possibly a few houses have their name on a cutesy plaque (Jones with a ship icon), then further in the industrial district a few doors have smaller signs. Then there will be a door with just the number of the address. What's behind that door? Evil? Possibly but more likely just someone that does not want to advertise. Perhaps it's where Tailor Swift's business manager works, now he really does not want fans showing up at the door. But evil? Hmmm

That's the physical analogy of the dark/deep web. There's bad stuff behind a few of those unadvertised doors. All the addresses are registered in city hall records, and inspectors and firemen can check if the building is up to code so there are some restrictions to nuke factories but should the government put a web cam inside every single door? Every room of every warehouse? Makes no sense, the terrors of the dark web are along those lines. The biggest winner of the Memex dark search will be the contractor that gets to build out big insanely-massive-huge server farms with a wide open government account.
posted by sammyo at 2:21 PM on July 11, 2015 [3 favorites]

I dunno, the SciAm article linked above suggests that it's an actual thing used in production:
The NYDA says that its new Human Trafficking Response Unit now uses DARPA’s Memex search tool in every human trafficking case it pursues. Memex has played a role in generating at least 20 active sex trafficking investigations and has been applied to eight open indictments in addition to the Gaston conviction, according to the NYDA’s Office.
posted by RobotVoodooPower at 6:14 PM on July 11, 2015

I dunno, the SciAm article linked above suggests that it's an actual thing used in production:

Yeah, it's really confusing, isn't it, given the grant timelines? But I think part of the explanation might lie in this quote from the same article:

Traffic Jam, developed independently of DARPA in 2011 by Carnegie Mellon University researchers and later spun off into a company called Marinus Analytics, enabled investigators to gather evidence by quickly reviewing ads the trafficker posted for several locales.

DARPA has since awarded Carnegie Mellon a three-year, $3.6-million contract to enhance Traffic Jam’s basic search capabilities as part of Memex, with machine-learning algorithms that can analyze results in depth, according to the university.

I can't spot very much of this particular piece software in the github beyond something described as "Regex based information extractor for online advertisements", so maybe it isn't fully public yet, but 2011 is pretty old by these standards and so it seems like there's something there that could have been useful to law enforcement for quite some time under one name or the other. I also noticed a project that suggests that JPL has the contract to do a unified interface, so this might be what you need to install to get a search box. (Initial commit was 9 months ago, pretty continuously under development since then up through the present.) I still think it's pretty safe to say still that the marketing for this program is one thing, the actuality of it, another, regardless of the potential.
posted by advil at 7:16 PM on July 11, 2015 [2 favorites]

« Older "Yo, grannie... Let's go!"   |   That clock =never= rings. Newer »

This thread has been archived and is closed to new comments