DNA data storage
November 25, 2012 11:13 AM Subscribe

Just think about it for a moment: One gram of DNA can store 700 terabytes of data. That’s 14,000 50-gigabyte Blu-ray discs… in a droplet of DNA that would fit on the tip of your pinky. To store the same kind of data on hard drives — the densest storage medium in use today — you’d need 233 3TB drives, weighing a total of 151 kilos. In Church and Kosuri’s case, they have successfully stored around 700 kilobytes of data in DNA — Church’s latest book, in fact — and proceeded to make 70 billion copies (which they claim, jokingly, makes it the best-selling book of all time!) totaling 44 petabytes of data stored.

via

posted by latkes (71 comments total) 28 users marked this as a favorite

So the Cloud has been replaced by the Slime?
posted by RobotVoodooPower at 11:24 AM on November 25, 2012 [6 favorites]

Yeah but how do the seek times compare to SSD?
posted by Jimbob at 11:39 AM on November 25, 2012 [17 favorites]

While this is beautiful work, it ain't replacing mangeto or magneto optical domains any time soon. Tell me when synthesis (write) rates progress above a few hundred Hz and nanopore sequencing's REAL output rates are in the MHz or GHz rather than kHz range and then we can talk about storage....
posted by lalochezia at 11:39 AM on November 25, 2012 [1 favorite]

And this way, hackers could use *real* viruses to infect the data! In all seriousness though, a cool concept.
posted by smirkette at 11:44 AM on November 25, 2012 [2 favorites]

lalochezia, read the article. That point is talked about extensively. The upside to DNA storage is obviously not the read/write speeds, but instead the stability. They talk about a lump of DNA staying stable in your garage for centuries.
posted by Inkoate at 11:46 AM on November 25, 2012 [3 favorites]

Storage density is just one aspect one must consider. As Jimbob rightly points out, seek time is a huge factor that may be against this type of storage (at least at this stage). Not all data needs to be accessed fast, such as backups. You can have a faster buffer layer that writes out to DNA in it's own time. But of even more importance these days is the energy consumption. If the energy cost of storing it is essentially zero (does it need a specific temperature?) after it's been written and until it needs retrieval, that's a huge win for a lot of uses.
posted by jeffamaphone at 11:49 AM on November 25, 2012 [1 favorite]

In every step of this chain, you lose speed but gain stability and capacity: CPU register, volatile memory, hard-drive, tape... and now DNA.
posted by phrontist at 11:49 AM on November 25, 2012 [1 favorite]

Church's latest book, in fact

I'm going to need stronger reading glasses for that one.
posted by Egg Shen at 11:57 AM on November 25, 2012

Yeah but how do the seek times compare to SSD?

Terrible. Think tape backup at the very best.
posted by Tell Me No Lies at 11:58 AM on November 25, 2012

The thing I like most about the idea of DNA encoding is the idea that using this method, all human knowledge could survive catastrophic world events. Inject a few million roaches or termites and even if we all perish and the human race is wiped from the face of the planet, our progress will have been recorded and live on. Something about that is comforting. I don't know why.
posted by lazaruslong at 12:01 PM on November 25, 2012 [5 favorites]

Synthetic biotech is the new nanotech, is the new everything. We're just beginning to realize and understand the utterly ludicrous evolution-grown reality-hacking abilities of DNA and the myriad systems it manages.

I'd highly recommend spending the 5-6 hours to watch George Church & Craig Venter talk in-depth to a lay audience about the future of synthetic biotech. It's a few years old, but seems prescient if slightly optimistic about the incredible implications of this technology.

Using it for data storage is the tip of the iceberg. Wait until we're using it for programming.
posted by crayz at 12:04 PM on November 25, 2012 [10 favorites]

I seem to remember the fastest copy-and-write DNA synthesis in nature is about 2000 bases per second in some bacteria or other. SSDs are what - 200 million bits/second? 5 orders of magnitude leaves quite a lot of work to do.
posted by cromagnon at 12:07 PM on November 25, 2012 [1 favorite]

The thing I like most about the idea of DNA encoding is the idea that using this method, all human knowledge could survive catastrophic world events. Inject a few million roaches or termites and even if we all perish and the human race is wiped from the face of the planet, our progress will have been recorded and live on. Something about that is comforting. I don't know why.

But how will the aliens know how to read it?
posted by showbiz_liz at 12:10 PM on November 25, 2012 [2 favorites]

Inject a few million roaches or termites and even if we all perish and the human race is wiped from the face of the planet, our progress will have been recorded and live on

Probably not. DNA repurposed for storage is unlikely to be assimilated into living creatures in any heritable way. Which, if you think about it for a minute, is a good thing. Because otherwise, what these guys are making is a global pandemic with a cute side-purpose.
posted by lumpenprole at 12:11 PM on November 25, 2012 [3 favorites]

The thing I like most about the idea of DNA encoding is the idea that using this method, all human knowledge could survive catastrophic world events. Inject a few million roaches or termites and even if we all perish and the human race is wiped from the face of the planet, our progress will have been recorded and live on.

By the way, you know those immense stretches of noncoding DNA that make up 90% of the genome in some species...
posted by XMLicious at 12:22 PM on November 25, 2012 [13 favorites]

The spec for SDXC is 2TB in a micro-SD card. A micro-SD card weighs .5 grams. So, 4TB a gram storage will be available in your retail store in the next few years for under $100. That's CONSUMER storage. Sony and IBM and Intel most likely have something many magnitudes more dense currently available in a lab somewhere.

Moore's law, ect, ect, ect.....

(yes, I know I'm missing the point a bit)
posted by lattiboy at 12:23 PM on November 25, 2012 [1 favorite]

Everything is data storage. That is, everything has the potential to store data. What matters is how you interpret it, how, or if, you can read it, and how reliable it is.
posted by JHarris at 12:25 PM on November 25, 2012 [4 favorites]

Holographic storage sounds like a cool near term thing.

Ok, DNA will not be running Quake 4 or linux tomorrow... but Genution is on the horizon, and Genuters, as well as intelligently designed organisms will be part of ethics debates in the coming decade.

Who among us has not had DVDs&CDs come out of brand new packaging scratched, leaving forever corrupted media --yet people spend billions of dollars furthering this industry despite the obvious downsides and flaws of reliance on optical media. So, before saying "this isn't perfect", it is worth looking closely at the world we have. Sometimes data stability is worth far more than speed (and vice versa, of course), not all use functions are the same (there is tendency in topics like this, and "quantum computing" to raise objections because it isn't going to immediately "replace the PC used to write the comment", or to "play games on").

The roaches will be the aliens (but really, the user likely meant "encode", -or something like that- the roaches, not inject -same fish for the fry, far as this discussion is concerned though).

Glowing mice can reproduce, no? Well the fish seem to be able, anyway, it would seem entirely possible to create transgenic organisms, which can reproduce, so, maybe the user's terminology was off, but it seems possible today, to add lines to "natures code", in a rudimentary manner, but I don't see how some sort of "living storage" system is impossible.
posted by infinite intimation at 12:49 PM on November 25, 2012

It's going to be awfully interesting to see if anybody can ever engineer a denser storage medium than DNA. It's a mere ~50 atoms/bit and naturally compacts down to extremely dense 3D structures, which is always going to trump the 2D layouts of silicon chips or magnetic planes. It doesn't require wiring or static structures for retrieval, because you just dilute the DNA in water and let thermodynamical base-pairing against DNA-labeled beads pull out the bits that you're interested in. It's far more stable than magnetic or electrical states, because covalent chemical bonds are always going to beat your electron or magnetic pole trapping schemes. We were able to sequence 40,000 year old neanderthal DNA, that was just lying in bones in caves. With our understanding of the degradation process we can engineer codes that would make DNA storage efficient for 40k years. Try getting data off that hard disk in a mere 100 years.

It's an almost perfect archival medium. However, it's not a good archival medium when in living cells, because it's copied more frequently than it has to, and exposed to far more mutational processes, and selective pressure. There are all sorts of hidden messages encoded in our genomes, scars of past mutations and infections, but we see only the traces. Genomes are living documents, trying to keep them static is futile.
posted by Llama-Lime at 1:07 PM on November 25, 2012 [5 favorites]

Here is the actual paper, which I think is free access but I'm not sure from my institutional internet connection. If it isn't feel free to MeMail me with an email address I can send a PDF to - for the purposes of this academic discussion we are currently having of course.
posted by Blasdelb at 1:27 PM on November 25, 2012 [3 favorites]

But how will the aliens know how to read it?

Probably if we encode it with some bootstrapping information, similarly to the method given in this Guy Steele talk. (He starts to give away the joke at about 7:13 in and fully explains it at about 8:30, but it's really worth it to watch up to that part.)
posted by A dead Quaker at 1:36 PM on November 25, 2012 [3 favorites]

Also, as long as we're talking about interesting storage devices, there is a new storage medium based on phase changes in tiny amounts of salt hydrates that could very well overtake SSDs in a few years, for several reasons (faster and more stable).
posted by A dead Quaker at 1:43 PM on November 25, 2012 [2 favorites]

showbiz_liz: "But how will the aliens know how to read it?"

No idea. But I always liked that saying about how if all of science was lost, and then rediscovered, it'd be the same stuff being re-learned. The nature of the universe doesn't really change in that way. So one day if another higher-order species advanced really far, maybe they would find our books and plays and music. I'd like to think that, at least.
posted by lazaruslong at 1:52 PM on November 25, 2012

By the way, you know those immense stretches of noncoding DNA that make up 90% of the genome in some species...

Non-coding != non-functional.
posted by kersplunk at 1:57 PM on November 25, 2012 [3 favorites]

The thing I like most about the idea of DNA encoding is the idea that using this method, all human knowledge could survive catastrophic world events. Inject a few million roaches or termites and even if we all perish and the human race is wiped from the face of the planet, our progress will have been recorded and live on.

This is basically the plot of Philip K. Dick's 'The Preserving Machine'. It doesn't end well.
posted by verstegan at 1:59 PM on November 25, 2012 [4 favorites]

It's going to be awfully interesting to see if anybody can ever engineer a denser storage medium than DNA.

Synthesize bases with various combinations of stable nitrogen, carbon, and oxygen isotopes. There are two stable isotopes of nitrogen, two of carbon, and three of oxygen. I'm too lazy to do the math, but that gives you a huge number of subtypes for each base—different masses, but more or less identical chemical behavior.

You'll need a Mass Spec for data retrieval, and writing will be a nightmare, but the potential information density would exceed that of natural DNA by many orders of magnitude.
posted by dephlogisticated at 2:01 PM on November 25, 2012

Maybe the Internet has ruined everything for me but I just imagine a regular SATA drive filled with jizz. Even worse, having it ooze out like it was overfilled.

Or for the [H]ard/overclocker crowd imagine that dude from 4chan that posted pics of 2 liter soda bottles filled with his years worth of saved cum, but with neon lights on the side.
posted by wcfields at 2:18 PM on November 25, 2012 [2 favorites]

By the way, you know those immense stretches of noncoding DNA that make up 90% of the genome in some species...

Just because it's not coding doesn't mean it's not being used for something. The latest result from the ENCODE project is that about 80% of the human genome is involved in significant biological activity - you can argue about how you define significant biological activity, but it's not totally useless and you can't ditch it all without consequence. Previous discussion of ENCODE.

Encoding information in "junk DNA" is also a plot point in Ian McDonald's otherwise very good The Dervish House.
posted by penguinliz at 2:25 PM on November 25, 2012 [6 favorites]

"Synthesize bases with various combinations of stable nitrogen, carbon, and oxygen isotopes. There are two stable isotopes of nitrogen, two of carbon, and three of oxygen. I'm too lazy to do the math, but that gives you a huge number of subtypes for each base—different masses, but more or less identical chemical behavior."

Also, so long as you didn't care about ever actually using the data, there are many more than just four bases. Each of the four has a variety of analogs that have the same width and the same Watson-Crick face, which means that they could be substituted without disrupting the basic secondary structure. Substitutions of hydroxymethyluracil for thymine that then get modified to alpha-PutrescinylThymine even allow for denser packaging of DNA.¹

¹Miller PB, Scraba DG, Leyritz-Wills M, Maltman KL, and Warren RAJ. 1983. Formation and possible functions of alpha-putrescinylthymine in bacteriophage phi W-14 DNA: analysis of bacteriophage mutants with decreased levels of alpha-putrescinylthymine in their DNAs. J. Virol. 43:399-405.
posted by Blasdelb at 2:27 PM on November 25, 2012 [3 favorites]

Just because it's not coding doesn't mean it's not being used for something.

[I would hazard a guess that they were making a joke (hopefully), slyly suggesting the "ancient aliens" dude, who probably says that "our DNA is already storing some 'secret' ancient history lessons", or something like this, the joke being a fact free assertion that the non-coding region was already an in-place, 'ancient' form of 'biological memory', waiting for us to decode it, and use the enclosed instructions to grow fleets of perfect interstellar vessels biologically; permanently Tun-staged, geneered giant Water Bears.]

Presumably all put in place by Austr-aliens
posted by infinite intimation at 2:51 PM on November 25, 2012 [1 favorite]

Anonymous has hacked my toenails.
posted by arcticseal at 3:12 PM on November 25, 2012 [1 favorite]

It is also important to keep in mind that while Dr. Church is a pretty amazingly talented engineer, his efforts have been central to a lot of the genomics revolution, and he is pretty intimately tied up in just about everything with venture capital, he is also a first class media troll who regularly says crazy things he can't back up.
posted by Blasdelb at 3:21 PM on November 25, 2012

That is at least in terms of what the future might hold, this specific project is indeed published in Science for a reason.
posted by Blasdelb at 3:22 PM on November 25, 2012

DNA's cool and all, but the fact that you basically need to destroy it to read it is not something to be ignored.

My money's on etching into stone, perhaps with an epoxy layer.
posted by effugas at 3:29 PM on November 25, 2012

But I always liked that saying about how if all of science was lost, and then rediscovered, it'd be the same stuff being re-learned.

Some species after us might discover reading and writing DNA, but the odds that they'd find our language or our encoding comprehensible are pretty nonexistent.
posted by Pope Guilty at 3:31 PM on November 25, 2012

Hey Blasdelb, can you elaborate?
posted by latkes at 3:31 PM on November 25, 2012

Well in 2009 he told the New York Times that he could clone a Neanderthal for $30 million dollars with then current technology, raising all manner of ruckus. His work, published now, is a big piece of what would have been necessary but nothing like the whole pie and its clear now that there weren't other secret projects working on the other huge technical limitations that would stop such an aim.
posted by Blasdelb at 3:44 PM on November 25, 2012 [1 favorite]

The fidelity of reproduction is also bound to be unacceptable with this kind of storage medium. Even if there's an error in strand synthesis once every 100,000 bases, that's going to mean that several errors creep in to every megabyte of information. With no checksum capacity, you have to double-check every new base addition as it's added to prevent accretion of errors.
posted by yellowcandy at 3:50 PM on November 25, 2012 [1 favorite]

wcfields: "pics of 2 liter soda bottles filled with his years worth of saved cum"

Bottles? As in the plural of bottle? Like more than two liters of ejaculate? That is mind blowing on so many levels.
posted by wierdo at 3:51 PM on November 25, 2012

George Church is rather promiscuous on joining Scientific Advisory Boards, but IMHO the VC smear a bit out of place. He's a true scientist, and its eminently clear that he's devoted to the advancement of the field and the science itself, from the Personal Genome Project to his open source sequencing machine to the huge number of other projects that you never hear about. He's completely open, and I've never seen anything the least bit untoward in any of his dealings with the commercial side of science. Compared to the more commonly referenced genomics luminaries, who are far more sharky in their dealings (according to rumor and innuendo, not personal experience), I'd take George any day. I do agree about the outlandish claims about Neanderthal cloning though.
posted by Llama-Lime at 3:56 PM on November 25, 2012 [1 favorite]

wierdo: "That is mind blowing on so many levels."

Ahem.
posted by boo_radley at 4:03 PM on November 25, 2012 [1 favorite]

yellowcandy, that's a problem that current media also has to deal with, as the string of bits stored is not translated with every 1 as North and 0 as South, or vice versa. Information theory has great solutions for this, and Reed-Soloman or Low-density Parity Check codes will work as well on DNA as they do on other information channels. So whatever the writing and reading error rates happen to be, it will be possible to use DNA storage with an encoding and decoding at arbitrarily small error rates.
posted by Llama-Lime at 4:14 PM on November 25, 2012

Bottles? As in the plural of bottle? Like more than two liters of ejaculate? That is mind blowing on so many levels.

On a total derail of my own post, this comment reminds me of when, as a young lesbian, I was given a jar containing my forthcoming child's sperm donor's donation. Having learned much of what I know about ejaculate from watching gay male porn, I expected to receive a large, overflowing jug of the stuff, and was very disappointed by the couple millimeters I found in the bottom of the jam jar. Oh well, I guess more proof of how much genetic information you can pack into a small space!
posted by latkes at 4:18 PM on November 25, 2012

Promiscuous is putting it mildly, but I didn't intend my mention of his extensive involvement in industry as a smear. While is is certainly no Venter, neglecting to extort the Federal Government Dr. Evil stye by threatening to extinguish entire scientific disciplines is an awfully low bar to set.

George Church is many things but open is not really one of them. When I was a doe-eyed undergrad I gave him way to much credit for the access to unpublished information he has got to have being on so many boards, but I'm never going to forget the rank bullshit that was pretty much everything he said about Neanderthal genome synthesis or a bunch of other things.
posted by Blasdelb at 4:20 PM on November 25, 2012 [1 favorite]

Interim Apple Chief Under Fire After Unveiling Grotesque New MacBook

"Oh, my sweet God," Apple employee Kurt Starfeldt said after viewing the MacBook up close. "It appeared to be discharging some sort of mucus-type substance from the headphone jack and making these weird murmuring sounds. And then it started quivering at one point when Tim was demonstrating how to use the touch pad. It was quite upsetting, actually."
posted by Blazecock Pileon at 5:12 PM on November 25, 2012 [2 favorites]

Maybe we're not to first ones to think of this?
posted by blue_beetle at 5:28 PM on November 25, 2012 [1 favorite]

wierdo: "wcfields: "pics of 2 liter soda bottles filled with his years worth of saved cum"

Bottles? As in the plural of bottle? Like more than two liters of ejaculate? That is mind blowing on so many levels"

4 years worth, he posts his collection every December 26th: *NSFW* *NSFL*.

So expect a few bottles filled with Petabytes of DNA in about a month.
posted by wcfields at 5:52 PM on November 25, 2012

Hoarding, do you know no bounds?
posted by wierdo at 6:24 PM on November 25, 2012

That's disgusting.
posted by flippant at 6:56 PM on November 25, 2012

Interim Apple Chief Under Fire After Unveiling Grotesque New MacBook

"Finally I hit the eject button and a tray popped open and spit out a bunch of teeth. Why does it have teeth?"
posted by lodurr at 6:58 PM on November 25, 2012 [1 favorite]

The thing I like most about the idea of DNA encoding is the idea that using this method, all human knowledge could survive catastrophic world events. Inject a few million roaches or termites and even if we all perish and the human race is wiped from the face of the planet

The scale is way off here. First of all "all human knowledge" is way more than 700 petabytes at this point. Secondly, Animal cells do not have a gram of DNA in them. At most, you could store a few gigabytes of stuff on the amount of DNA in a typical eukaryotic (animal/plant/fungi) cell. Of course, you'd need to pack that DNA alongside a cell that already exists.

And then, of course you have the problem of mutation. DNA sitting in a garage for a few hundred years is fine, but if you try to replicate and re-replicate it inside of a cell it will degrade if not evolutionarily conserved (i.e. if gene mutates, the animal dies).

As far as I know, no one has really figured out how to add 'error correction' to DNA copies. I think I read about some bacteria that has a sort of error correction using multiple copies of it's genome (and thus is super radiation resistant). But that bacteria probably has a much shorter genome.

I doubt DNA will ever be more practical then electromagnetic/quantum storage.

Synthesize bases with various combinations of stable nitrogen, carbon, and oxygen isotopes. There are two stable isotopes of nitrogen, two of carbon, and three of oxygen. I'm too lazy to do the math, but that gives you a huge number of subtypes for each base—different masses, but more or less identical chemical behavior.

You'll need a Mass Spec for data retrieval, and writing will be a nightmare, but the potential information density would exceed that of natural DNA by many orders of magnitude.

That's interesting. Probably the best molecule for this would be a carbon nanotube using different isotopes of carbon. A carbon nano-tube is actually about ½ as think as a DNA double helix, and has about 10 atoms per loop, so you could get 10 bits per loop, and about 100 bits per nano meter.

From what I can tell DA has 20 base pairs per two turns, and a turn is 3.4 nano meters. Each pair is two bits, so 6.8nm = 40 bits. Vs. 640 for carbon nanotube isotopes storage.

And of course, a carbon nanotube would be far, far, far stronger then a strand of DNA.

Now, a slightly more practical variation: Pack various non-reactive atoms (like noble gasses) large enough not to be able to move around into the tube. Then you should be able to figure out which atom is which by blasting them with photons to see which ones get absorbed. So long as you can aim your photons precisely enough, it should work.

4 years worth, he posts his collection every December 26th: *NSFW* *NSFL*.

Yeah... I'm not just going to go ahead and not click that link.
posted by delmoi at 8:09 PM on November 25, 2012 [6 favorites]

This reminds me; I've always wondered - wouldn't organic computers be susceptible to an actual virus?
posted by eustacescrubb at 8:55 PM on November 25, 2012

wouldn't organic computers be susceptible to an actual virus?

Assuming an organic computer has biochemistry similar to that of regular life — cell membranes, metabolic pathways, protein signaling, etc. — there's no reason it couldn't be diverted to making virus instead of doing calculations. But as attempts at DNA computing are not currently usable for solving real world-scale problems, an organic computer is some ways away. We can only barely and clumsily engineer de novo proteins to do useful things, by jury-rigging functional elements together and seeing if it works. We're not even very good at predicting a protein's structure accurately!
posted by Blazecock Pileon at 9:32 PM on November 25, 2012

Regarding the potential for copy errors, actual biological life gets around this by, essentially, storing data in blocks of three bases rather than each single base. And there's a lot of redundancy even in this three-base code: there are multiple different sets of three-base sequences that can be translated to each particular amino acid when cells actually "read" DNA to (create RNA to) create proteins. The exact number of three-base sequences corresponding to each amino acid varies, but they are grouped in sets that have similar structure, and thus small mutations or transcription errors at the DNA to RNA stage tend to get corrected for by the final protein synthesis stage. This probably ends up being too restrictive for human's computational purposes, but there are a variety of options for DNA-based information encoding schemes that would allow for verifying data integrity and recovering from some number of random errors without loss of information.

The original interest in DNA computing stems from a paper of Adleman in 1994 (at the top of the links here) where he used DNA to solve an instance of the traveling salesman problem (recently on Metafilter) as a proof-of-concept. I haven't kept up with developments in the field, but one of the original ideas is that it could be useful for solving different types of problems than traditional computers, where the extra write and read time for DNA computation is made up for by the computational time savings for very large search problems that can't really be simplified.
posted by eviemath at 9:56 PM on November 25, 2012 [1 favorite]

Someone needs to link Peter Watts to this; he's gonna trip.
posted by dethb0y at 10:31 PM on November 25, 2012

Now, a slightly more practical variation: Pack various non-reactive atoms (like noble gasses) large enough not to be able to move around into the tube. Then you should be able to figure out which atom is which by blasting them with photons to see which ones get absorbed. So long as you can aim your photons precisely enough, it should work.

You're talking angstrom level differences while the wavelength of visible light is on the order of hundreds of nanometers, you'd need to be making gamma rays before you got precise enough, which would have their own issues.
posted by Blasdelb at 12:02 AM on November 26, 2012 [1 favorite]

we've gotten used to thinking of data storage as something that has to be done with both great accuracy and great precision. but that's an artifact of the type of data we've heretofore been interested in storing. Think about the type of data that DNA stores now. For that application, it's been pretty good. What I'm struggling with is how to extend that analogy in a way that's useful without continually coming back to "well we can use it to store information about genomes...."
posted by lodurr at 3:18 AM on November 26, 2012

biomimicry thinking can go too far. There seems to be a widely held assumption that evolution will produce a better result than we can think of, just by its nature, and that's just not the case. Evolution is good at continually correcting solutions to match the current conditions, but it knows nothing about the future and can make no reference to any part of the past that it's discarded, so it's never producing an optimal solution -- just a solution that's optimal for that situation, within a reproductive time frame.

That having been said, I have to think that any real work on this as an actual data storage concept would leave behind conventional notions about what DNA needs to do or how it should work pretty quickly.
posted by lodurr at 3:19 AM on November 26, 2012

I'm not just going to go ahead and not click that link.

Just pretend it's cake frosting.
posted by laconic skeuomorph at 5:24 AM on November 26, 2012

...and that would make it better how?
posted by lodurr at 8:58 AM on November 26, 2012

DNA's cool and all, but the fact that you basically need to destroy it to read it is not something to be ignored.

DNA can also be copied easily. The fact that reading a magnetic core bit erases it didn't keep core memory from being the state of the art for many years.
posted by localroger at 9:44 AM on November 26, 2012

Eviemath, what you're saying is absolutely true for biological systems. But it looks from this paper as if the authors are using bases, not codons, to code for 0/1.

If they want to move to a codon-level approach, they'll need to increase the number of bases needed for every bit from one to three. Hence an even longer write-process.
posted by yellowcandy at 9:54 AM on November 26, 2012

DNA's cool and all, but the fact that you basically need to destroy it to read it is not something to be ignored.

To be correct, you are not "destroying" but digesting or cleaving the original DNA into many small fragments. With PCR, you can exponentially amplify these fragments of your original DNA in a few hours. It seems unlikely one would ever store the original DNA in toto, but instead keep pieces of it in a (clone) library, prepared for sequencing, in order to facilitate reassembling the fragments and reading the original. The library and assembly portions are the rubber-meets-the-road parts of high-throughput sequencing that underpin Church and Kosuri's work.
posted by Blazecock Pileon at 11:40 AM on November 26, 2012 [1 favorite]

> Why does it have teeth?

iCronenberg?
posted by ostranenie at 12:11 PM on November 26, 2012

Now, a slightly more practical variation: Pack various non-reactive atoms (like noble gasses) large enough not to be able to move around into the tube.

A few guys at Berkeley wrote a paper about something similar a few years back.
posted by Dr. Zachary Smith at 1:53 PM on November 26, 2012 [1 favorite]

You're talking angstrom level differences while the wavelength of visible light is on the order of hundreds of nanometers, you'd need to be making gamma rays before you got precise enough, which would have their own issues.

Hmm, good point. There is probably some other way to do it, though. Magnetic resonance, maybe? Could it be done by electron interaction? Hmm...
posted by delmoi at 2:45 PM on November 26, 2012 [1 favorite]

BP is illustrating the kind of different thinking you'd need to do to make really effective use of this technique. As a data storage mechanism serving current paradigms, DNA is wildly unsuitable; but if we think of some new paradigms, it might well be fantastic. It's going to have to be pretty different, though.
posted by lodurr at 3:41 PM on November 26, 2012

Sequencing is still expensive, so while I think the idea of using DNA for storage is awesome, it seems easy until someone has to pay for reading out data. To give some idea:

454 Sequencing: ~$85/Mbase
Illumina paired-end: ~$1/Mbase
SOLiD: ~$6/Mbase

If we take the cheapest of these three popular options and the 5.27-megabit data stream from the paper (which is ~2.6 Mbase, each base taking two bits to represent) this wasn't an expensive experiment, even with deep sequencing coverage (the authors seem to use 100x read coverage).

But if you think about the authors' implication about the storage density of DNA — petabytes in a test tube — the sequencing cost to read that scale of data via this mechanism seems like it would get prohibitive very quickly.

Currently, I think the monetary and time cost of sequencing a human genome (~3 Gbase) is $3000/week. The pursuit is towards a "$1000/day/genome", which makes sequencing more feasible for clinical work. So improvements in sequencing technology are ongoing. At the moment, however, its use for facilitating data retrieval via DNA as storage medium does not seem practical, outside of the laboratory. Still, an excellent idea is an excellent idea and I'd be excited to see where this might be in a decade.
posted by Blazecock Pileon at 5:34 PM on November 26, 2012 [1 favorite]

Eviemath, what you're saying is absolutely true for biological systems. But it looks from this paper as if the authors are using bases, not codons, to code for 0/1.

Yep, absolutely. Just trying to point out that using bases rather than traditional computers does not preclude building some redundancy and error-correction into your coding scheme. I guess it would have been more to the point to note that, since they are using DNA just to store binary information, regular binary error-correcting codes would suffice. But I think someone else mentioned that upthread already:)
posted by eviemath at 8:12 PM on November 26, 2012 [1 favorite]

This sounds really neat; Chemical biology: DNA's new alphabet. DNA has been around for billions of years — but that doesn't mean scientists can't make it better". Going to have to read up on this more, the article states "researchers haven't shown that polymerases can copy more than four of the paired bases in a row", sounds like an interesting line of inquiry though (not obviously related to "DNA data storage", but sort of dovetails with the ideas Blasdelb was mentioning about alternate base pairings, and it is always interesting to introduce novel properties to known systems)

Eric Kool, a chemist now at Stanford University in California, wondered whether his team could develop unnatural bases with fixed hydrogen-bonding arrangements. He and his colleagues made a base similar to the natural base T, but with fluorine in place of the oxygen atoms (see'Designer DNA'), among other differences⁵. The structure of the new base, called difluorotoluene (designated F), mimicked T's shape almost exactly but discouraged hydrogen from jumping.

The team soon discovered that F was actually terrible at hydrogen bonding⁵, but polymerases still treated it like a T: during DNA copying, they faithfully inserted A opposite F (ref. 6) and vice versa⁷. The work suggested that as long as the base had the right shape, a polymerase could slot it in correctly. “If the key fits, it works,” says Kool.
....
Floyd Romesberg, a chemical biologist at the Scripps Research Institute, has expanded the repertoire of hydrophobic bases. Starting with molecules such as benzene and naphthalene, his team built “every imaginable derivative”, he says. “It drove us very much away from anything that looked like a natural base pair at all.” But while testing steps in the replication process, the researchers found two contradictory requirements. A crucial position in the base had to be hydrophobic for enzymes to insert the base into DNA, yet it also had to accept hydrogen bonds if enzymes were to continue with copying the strand.

Romesberg's team screened 3,600 combinations of 60 bases for the pair that was copied the most efficiently and accurately⁸. The two that won, MMO2 and SICS, “walk a thin line” between being hydrophobic and hydrophilic at the key position, Romesberg says.

posted by infinite intimation at 7:12 PM on November 27, 2012 [2 favorites]

II, I am super late to this thread but that reminds me that incredibly, people are also doing this at the protein level, constructing orthogonal versions of the ribosome that read 4 bases at a time instead of 3 and insert bizarre amino acids with alkyne or azide groups hanging off of them. (more about Jason Chin's lab - I think other groups are doing this but his is the one I know best)
posted by en forme de poire at 8:30 PM on December 1, 2012

« Older Faster than a speeding bullet! More powerful than... | The Four Queens Newer »

This thread has been archived and is closed to new comments

MetaFilter

DNA data storage
November 25, 2012 11:13 AM Subscribe

Tags

Share

DNA data storage November 25, 2012 11:13 AM Subscribe

Tags

Share

DNA data storage
November 25, 2012 11:13 AM Subscribe