Terabytes by mail--Interview with Jim Gray
July 11, 2003 6:41 AM   Subscribe

Interview with Jim Gray, head of Microsoft's Bay Area Research Center. "Clear your schedule, because once you've started reading this interview, you won't be able to put it down until you've finished it. Who would ever, in this time of the greatest interconnectivity in human history, go back to shipping bytes around via snail mail as a preferred means of data transfer? (Really, just what type of throughput does the USPS offer?) Jim Gray would do it, that's who. And we're not just talking about Zip disks, no sir; we're talking about shipping entire hard drives, or even complete computer systems, packed full of disks."
posted by mooncrow (23 comments total)
 
I see this was also just posted on Slashdot, with almost the same quotation. Weird. But is is a cool interview.
posted by mooncrow at 6:46 AM on July 11, 2003


Good article.
posted by signal at 7:02 AM on July 11, 2003


"Just" ???

This article gets the over-hype of the year award. Why would an author start an article by telling you how incredible their article is? It's just a clueless interviewer being amazed by the whole Internet/computer thing.
posted by y6y6y6 at 7:06 AM on July 11, 2003


Umm...dude...Gray is being interviewed by David Patterson--you don't know who that is, do you? "Shooting questions at Gray on such topics as open-source databases and smart disks is David Patterson, who holds the Pardee Chair of Computer Science at the University of California at Berkeley. Patterson headed up the design and implementation of RISC I, which laid the foundations for Sun's SPARC architecture. Along with Randy Katz, Patterson also helped pioneer redundant arrays of independent disks—yes, RAID"
Sheesh. Kids these days.
posted by mooncrow at 7:10 AM on July 11, 2003


On preview - what mooncrow said. I think the puff at the front was put on by a copy editor somewhere.

I liked the bit where he compares the relatively high cost of database administration (i.e. humans) with the relatively low cost of storage, and he says "Our chore is to figure out how to waste storage space to save administration." Built in apps for self-organising combined drives/databases sound interesting. The idea that we could (potentially) swap drives and somehow get those drives to format the huge amounts of data on them in ways useful to us is neat. Although, this might be nothing more than the latest in a long line of predictions that technology can provide us not only with the means to store/access/transfer data but also the means to organise it in ways useful to us.
posted by carter at 7:23 AM on July 11, 2003


I don't know where the rest of the world has been, but programmer types have been making jokes like "never underestimate the bandwidth of a station wagon filled with magtape" for decades now. I guess the modern version would be "an SUV full of CDRs"...
posted by Mars Saxman at 7:24 AM on July 11, 2003


Bill Gates in 1981: "640k should be enough for anybody''

Jim Gray in 2003: "What do you do with a 200-gig disk drive?"
posted by magullo at 7:42 AM on July 11, 2003


Slightly off-topic: I see at the end of the article that relational-database pioneer Ted Codd died in April. I can't believe I missed that. C.J. Date talks about Codd's contributions to the computer industry.
posted by F Mackenzie at 7:49 AM on July 11, 2003


At my current employer, we routinely have to send 3-7 Tb of data from one place to another, with not just speed, but insurability, so they bought 3 Sun RAIDs and ship them via fedex back and forth.
posted by nomisxid at 8:07 AM on July 11, 2003


"What do you do with a 200-gig disk drive?"

I think he may have misspoke. I think he's asking what do you do with a 20 terabyte disk. That's the only way the 10,000 movies line that comes a bit later makes sense. We'll still find all kinds of ways to fill that space (which he acknowledges), but nobody really knows what that's going to be yet.

Me, I want my petabyte drive now damn it.
posted by willnot at 9:19 AM on July 11, 2003


Yes, I was saddened to see that "relational-database pioneer Ted Codd died in April. I can't believe I missed that." I'm not sure how I did either, considering his importance to so many things with which I'm involved. I'm sorry that he is gone.
posted by mooncrow at 9:51 AM on July 11, 2003


Correct me if I'm wrong, but I don't think fedex will insure the data on the disks. Or is it one of those things where they'll insure for however much money you put on the line and charge your commensurately.
posted by Wood at 10:12 AM on July 11, 2003


nobody really knows what that's going to be yet.

Voxels for one. When we have fully realized interactive 3D without any rendering tricks, without constraining what the subject can look at and how closely they can look, the storage need will be ginormous.
posted by badstone at 10:14 AM on July 11, 2003


There's some mindblowing stuff in that article. It doesn't have anything to do with shipping data by mail.

Disks will replace tapes, and disks will have infinite capacity....gradually, all the processors will migrate to the transducers: displays, network interfaces, cameras, disks, and other devices....in that world, all the stuff about interfaces of SCSI and IDE and so on disappears. It's IP.
posted by lbergstr at 10:41 AM on July 11, 2003


"What do you do with a 200-gig disk drive?"

porn. movies. music.
posted by birdherder at 11:20 AM on July 11, 2003


Exactly what I was thinking, lbergstr. Also the potential from components being more intelligent with their own full operating systems, network stacks and services seems so obvious as stated but it's not something I'd really considered.
posted by vbfg at 12:07 PM on July 11, 2003


nomisxid: At my current employer, we routinely have to send 3-7 Tb of data from one place to another, with not just speed, but insurability, so they bought 3 Sun RAIDs and ship them via fedex back and forth.

wow. that's a scary thought. you never have disk failure?
posted by shadow45 at 12:52 PM on July 11, 2003


Shadow, I believe that fear is why the company uses RAID drives. (Written with a question in my mind whether or not you understand what RAID is...)
posted by billsaysthis at 2:00 PM on July 11, 2003


This was a great interview I thought...One thing that I know will be done with peta- and exabit storage capabilities is begin to further explore the human genome. The idea is that soon enough (within a few years) scientists will be able to sequence an individuals entire genome rapidly for under a thousand dollars. Many people will have their genome sequenced to give their doctor a complete (understatement!) picture of the patient. Researchers will then take genotypic information from many humans to begin correlating genes with diseases (to a much greater degree of accuracy than is possible today).
posted by topherbecker at 2:16 PM on July 11, 2003


topherbecker, agreed that this is a very interesting link (though perhaps slightly more suited to /.) but I don't quite see how petabyte storage help with genetic research when these guys (high-level scientists) already have access to just about any hardware they need. For the analysis you point out, I think the greater need is huge amounts of faster processor time. Maybe faster/smarter disk i/o helps some...

The other issue here is privacy, which will probably smack large scale genetic analytical projects right in the face. Are you going to give a copy of your 'hard disk' to some anonymous researcher and not worry where that data ends up?
posted by billsaysthis at 2:29 PM on July 11, 2003


Yes, I was saddened to see that "relational-database pioneer Ted Codd died in April. I can't believe I missed that." I'm not sure how I did either, considering his importance to so many things with which I'm involved. I'm sorry that he is gone.

I noticed, I told the rest of our programming department that:

"Codd is dead!"

...but no-one cared. I guess he's only really important if you've ever held relational databases close to your heart.

...and it was a 20 terabyte disk, as the disk guys reckon they have a factor of 100 left, before they run out of ideas again, and we're already on the 200 gigabyte disk.
posted by inpHilltr8r at 2:34 PM on July 11, 2003


David Patterson, who holds the Pardee Chair of Computer Science at the University of California at Berkeley. Patterson headed up the design and implementation of RISC I, which laid the foundations for Sun's SPARC architecture. Along with Randy Katz, Patterson also helped pioneer redundant arrays of independent disks—yes, RAID"
Sheesh. Kids these days.


Kids today would probably better know Patterson as half of the duo of Patterson and Hennessy.
posted by gyc at 2:47 PM on July 11, 2003


Just starting to read the article.. the bottleneck is throughput between disk and CPU ie. memory -- if storage is as fast as or faster than memory it keeps the CPU race alive. So it's not only amounts of storage, but speed of storage. Just like bandwidth is throughput + latency same with storage. It's like the difference between a 10Mb satellite modem and 10Mb cable modem.
posted by stbalbach at 5:15 PM on July 12, 2003


« Older You are your record collection.   |   Make a pillow dance to the lovely music. Newer »


This thread has been archived and is closed to new comments