How to Sequence a Genome
August 23, 2007 5:16 PM   Subscribe

How to Sequence a Genome [Flash. H/T to Jay]. Visualization of the process of genetic sequencing. Posted on the Nova website in conjunction with their show, Cracking the Code of Life, hosted by Robert Krulwich [Wiki].
posted by McLir (14 comments total) 9 users marked this as a favorite
Cool little Flash app. Of course, nowadays the state of the art is Illumina and 454 sequencing instead of Sanger sequencing. They can sequence a massive chunk of DNA in a single run. Sequencing a genome keeps getting cheaper and cheaper. I hope the day of the $1000 genome soon arrives.
posted by grouse at 5:30 PM on August 23, 2007

Wow, that's clever.
posted by smackfu at 5:44 PM on August 23, 2007

The new techniques pioneered by 454 and Solexa sequencing are cheaper and faster, but they only give reads of 200 and 25 base pairs, respectively. Why does this matter, you may ask?

Well, think of that last step as a giant jigsaw puzzle. We have to take all of the reads and piece them back together into one compelte genome. When we have 200 base pair reads instead of 900 base pair reads (Sanger), the puzzle is that much harder.

In fact, when reads are small, there's sometimes no way to tell exactly where they go, because a large fraction of our genome consists of repetitive sequence, where the 'pieces' look exactly the same!

So, the new technologies are great for some kinds of projects, but Sanger sequencing, as described in the flash app, isn't going anywhere for a while.
posted by chrisamiller at 5:57 PM on August 23, 2007

Back up. Why can only a certain number of pairs be read at a time? Brain gets full?
posted by DU at 6:06 PM on August 23, 2007

DU: it has to do with limitations of the particular chemical method being used. For example, with Sanger sequencing, as the length of a sequence increases, it gets harder to accurately separate it by size from other sequences. Sanger sequencing relies entirely on creating sequences of different lengths, separating them, and then measuring what the last nucleotide of the sequence is. So this means that you can only get reads of up to a certain length.
posted by grouse at 6:26 PM on August 23, 2007

Incidentally, I hear it's commonly up to about 1000 bases these days. I know of at least two technical reasons why there's a limit, but I've never actually done sequencing, there may be more reasons.

When they insert the fluorescent bases, what's actually going on is that they're doing normal copying, and a small percentage of the bases are fluorescent. Once a fluorescent base has been inserted, the works are gummed up and no more copying happens. Basic statistics says that there will be a geometric decrease in the length of these copies, so once you get to a certain distance there's not enough fluorescent signal to read.

Second, the techniques for separating molecules by their size have kind of unfavorable properties for getting equal separation along the entire length. There's much more separation for the lengths 50 and 51 than for 500 and 501, so you end up not being able to separate the peaks in addition to having much less signal.
posted by Llama-Lime at 6:28 PM on August 23, 2007

/me invents mechanical nanomachine to circumvent this problem
posted by DU at 7:01 PM on August 23, 2007

Krulwich is awesome at what he does. Whenever I hear him do something for National Public Radio I'm always left impressed and awestruck, but Brave New World with Krulwich and Koppel (TMBG weighed in as well which is tres sweet!) still echoes in my head and heart years later. I wish I could find it on DVD, even though time may not have been kind to it, it's how the information is presented that's just as fascinating as what they're disseminating.

Krulwich has a way of taking the mundane and making it palatable - even paramount to understanding one's own worldview. Why should a lack of beewax in Nova Scotia matter to someone in Athens, Georgia? Give Krulwich enough time and resources and he'll explain why for ya. And you'll believe it.

That may not make him a great news guy, but it makes him a fantastic storyteller. I find more truth nowadays in stories than I do news anyway. I don't understand why he's always doing pieces for other people's shows. Shouldn't a person with his sense of vision be heading up a program? Regularly shooting stuff out into the stratosphere, or do they only give that kind of job to people with no vision?
posted by ZachsMind at 7:30 PM on August 23, 2007

ZachsMind, I also have great respect for Krulwich and I've been following him since Brave New World. While sometimes he lets gimmicks override the message, he is still one of the best translators of scientific knowledge in the media. (Sagan, Gleick, Ferris, Burke and Suzuki are guideposts of popular explainers.) Radio Lab is probably the best example of Krulwich's work.
posted by McLir at 7:43 PM on August 23, 2007

I thought the key to cracking a genome was to have an adult pretend to be a 14 year old girl, get the genome to engage in racy conversations online, invite the genome over, and then confront it with transcripts of its online sex chat. Then I realized I was not watching Nova.
posted by Astro Zombie at 8:07 PM on August 23, 2007

1000 bp of highly-reliable trace off a single sequencing run is from plasmid DNA, not genomic. Maybe proprietary technology and "ideal" handling procedures can make 1kb reads off of genomic DNA. I'm skeptical from my understanding of modern commercial technology, but it's not out of the question.

But, yeah, the longer the reliable runs that you can sequence, the more confident one can be on the sequence.

If your run only goes to 200bp (bp=base pairs; the "average" size of a human gene [the entire length of nucleotides that encodes the gene not counting all the nucleotides that don't get turned into protein or are there to govern when/how/why the gene is sequenced] is about 1,000 - 3,000; with lots of exceptions and differenses between species.

There are *lots* of DNA sequences that are duplicated in chunks smaller, around, and larger than 200bp. If you can only sequence 200bp at a time, there's not a good way to tell if this 200bp fragment only happens once or happens twice or happens 34 times in a row.

wrt to farily common diseases like Parkinsons and Alzheimers with identified candidate risk-related genes where the length of the duplication mutation is quite tightly correlated with age-of-onset and disease severity, the "extra repetetive additional mutations" may likely not be detectable by a genomic DNA sequencing method that only reads 200 contiguous base pairs.
posted by porpoise at 10:24 PM on August 23, 2007

If you can only sequence 200bp at a time, there's not a good way to tell if this 200bp fragment only happens once or happens twice or happens 34 times in a row.

One trick that is used is that one might read 200 bp of sequence from either end of a larger sequence that you know the exact length of. To take chrisamiller's jigsaw analogy, this means that all of your jigsaw pieces are paired. You may only have a small puzzle piece, but you know that it must be exactly 50 pieces away from another particular puzzle piece. You can use this extra information to fit all of the reads together and it makes it easier to deal with some repetitive sequence.

And once you have a reference sequence there are easier ways to examine copy number variation than resequencing everything.
posted by grouse at 2:23 AM on August 24, 2007

Nice presentation, I wish we had stuff like that in undergrad. If anyone is interested in what the resulting data looks like, here are some pics of a recent run I did - start, middle and end .

The pics are made by separating the different length fragments produced as shown in the video through a capillary based on size, and measuring the fluorescence at different wavelengths, where each different color means a different base. In the old days, this was done by hand and eye on an agarose gel, like this. As you can see from the pics, the sequencing takes a bit to get accurate at the start, due to a lack of short fragments, unincorporated dye and other stuff. The middle gives nice tight peaks, showing clearly which base is present. When you get nearer to the end, the peaks get fatter and start to overlap, making it hard to tell if a large peak is one base, or two adjacent copies of the same one. In my end pic, the fragment I was sequencing was only about 800 bases long, so there wasn't too much degradation, you can see right where the DNA runs out.

Sanger sequencing is not going away anytime soon, not only because of the problem of the repeats mentioned above, but because it is much easier to sequence a gene of interest. Most of the new high-throughput machines (while incredibly awesome) only take random samples from the genome and sequence this whereas many people only need to sequence one small part, which can be done much more quickly and easily with the good old Sanger Method. Of course, Fred Sanger, by coming up with this ingenious method won the nobel prize in chemistry in 1980 (his second one).
posted by scodger at 3:13 AM on August 24, 2007

A friend who is in bioinformatics was telling me that you can order custom-made strands of specific sequences of DNA up to about 2000 bp. (The smallpox virus has about 200,000 bp).
posted by neuron at 9:02 AM on August 25, 2007

« Older " a Russian and Chinese-led alliance created...   |   Poo Bum Dicky Wee Wee! Newer »

This thread has been archived and is closed to new comments