Join 3,512 readers in helping fund MetaFilter (Hide)


Mad scientist in your own basement?
October 17, 2012 12:20 AM   Subscribe

The Genome Compiler is an IDE for DNA projects for all you DIYbio enthusiasts. Previously. Previously.
posted by lipsum (24 comments total) 16 users marked this as a favorite

 
I was actually planning on some mad science once I graduated and landed a job, this looks to be a great resource, both for schemata and inspiration (dremels as motors for a centrifuge? Never would have thought of that given my fear of sharp rotating bits of metal that I had no intention of touching, let alone disassembling).
posted by Slackermagee at 12:36 AM on October 17, 2012


You had me at 'IDE for DNA'.
posted by anaximander at 2:19 AM on October 17, 2012 [2 favorites]


So, ultimately, DNA is now making an IDE to make more DNA.
posted by Malor at 2:38 AM on October 17, 2012


This program is designed to help with something very much like what I do for a living, namely genome annotation and manipulation. If you guys remember all of the big genome sequencing projects of the 90s and the early aughts they've been continuing and the amount of raw data they have been giving back to us has rapidly accelerated. However, those of us trying to understand the biological realities of what all of those sequences actually mean were very quickly left behind and have been falling further and further behind as the advance of sequencing technology accelerates faster than we could ever hope to keep up with. The central problem is that while it turns out that we can get computers to do our pipetting for us if we pay engineers enough – we can’t get computers to do our thinking for us. Like mathematicians with some of the fanciest calculators imaginable, we can get the tools NCBI gives us to show us amazing things in amazing ways, but they can’t tell us what it all means. For the genomes we get to make any kind of sense a human being has to abstract meaning from it and communicate that meaning in understandable language – and there is no way around that limitation – there will only ever be ways to optimize it.

If there is a God of creation that wen't around designing the genomes of all of the living things on Earth, they are the sloppiest, most frustrating, terrible programmer you could possibly imagine. The Intelligent Design nuts are particularly frustrating to me having seen how fundamentally stupid the design of living critters actually is when you get down to the real moving parts. Looking at life through the lens of Max Delbrück’s slowly fulfilled dream of a science of molecular genetics to replace the stamp collecting of Drosophila genetics1, the organization of information, regulation, and function in genomes makes precious little intuitive sense in terms of human logic. When you think about it; dumb shit like fundamentally unrelated systems being piled on top of each other such that one can’t be manipulated without fucking up the other – necessitating otherwise functionless patches to the paired system whenever the other is modified, Rube Goldberg-esque fragile systems of regulation that respond to all kinds of wrong stimuli, systems of global regulation that are pretty analogous to reading the same giant program in either Python or C++ to produce one of two desired results, and the kinds of systems that you can just tell are 99.9% amateur patch jobs, are really what you would expect from systems designed exclusively by entropic trial and error.

The end goal of the folks behind iGEM is pretty simple on the face of it. They want to turn biological systems into abstractions that can be manipulated by people who don’t understand the lower parts. While this might seem like a trivial goal, when you really understand what it means, it becomes clear that it has the potential to change the world in intensely profound ways and the very nature of life itself. At the moment genomes can only really be meaningfully understood or manipulated by folks like me with expensive – and fucking difficult – educations. This is because, at the moment, in order to create de novo anything like a solid grasp of how things like the lac operon works in E. coli one needs to have a pretty good understanding of how things like DNA-binding proteins work, how the structure of DNA relates to its function, how ligand binding works, how transcription initiation works, and how enzymes do their thing. Similarly, in order to have any hope of understanding how one would manipulate systems like that, you’d need to have a good understanding of how cell competency works and can be created, how to manipulate plasmid vectors, the anti-parallel nature of DNA, how to use antibiotics and resistance cassettes to select for desired strains, what TATA boxes do, how Shine-Delgarno sequences work, how RNA polymerases tend to like to bind, how to choose which regulation mechanism to use, and that doesn’t even include the technical skills necessary to actually do it yourself. Their idea is to turn genes, gene cassettes, and genetic systems into 'BioBricks' that their manipulators don’t need to understand to be useful (in a way analogous to how Perl programmers and Sys Admins don’t need to understand Assembly language to be useful) and can pay to have manipulated in industrially mechanized ways. At the moment the iGEM folks are using the levels of abstraction they can already create to harness to creativity of undergrads with their competition, but what may lie ahead is much much cooler.

Until this summer I made fun of the nascent science of ‘Synthetic Biology’, what this program is in theory intended to support. When I made a post a while ago on what is still iGEM’s most exciting project, mayr was absolutely right when she went all Mol Bio hipster and declared: “I heard of iGEM before it was cool. BioBricks is for people who can't handle real cloning.”, I indeed LOLled a bit more than a little. BioBrick really is just a new name for gene cassettes, things that have been actively studied and manipulated since the 60s. What convinced me that this could actually be really amazingly cool was a talk Drew Endy gave at the most recent Bacteriophage conference in Brussels about the research that is going on in his lab, the parts he needed from us, and why. (37:38) [Don’t be intimidated by the technical nature of the talk – even if you zone out during the technical bits you can totally still get the point] . In it, he describes his lab’s quest to create what amounts to a living computer – programmable systems architecture within E. coli. The current project involves using the architecture he is building to create a trivially readable clock that reads out in binary that would track the number of generations that a culture of bacteria has gone through – which would itself be amazingly useful. However, if created, these kinds of systems architecture combined with sensor proteins, enzymes, and regulator molecules understood as BioBricks could make life understandable by people who are to us as programmers are to hardware engineers.

While I was sitting in that talk, knowing that the phage community does indeed have all of the parts he wants and then some, I couldn’t help but get goose bumps recalling one of my favorite stories from Science Fiction: The Nine Billion Names of God (part 2) by Arthur C. Clarke. Where suddenly I was, by way of analogy, a monk in his Lamasery slowly going about the task of annotating out the 10,000,000,000,000,000,000,000,000,000,000 (1031) names of creation. If we really can systematize the genome of a living organism into neat little boxes like a well designed program acording to the sensibilities and biases of human logic that would, in a very real and profound way, give us the ability to remake life in our own image in a way that very much evokes the line in Genesis that phrase comes from.

How cool is that?


1From the 1920s to the 1930s there was a mass movement of out of work physicists, having suddenly run out of things to do when we figured out to much of physics, to biology. They brought with them a mechanistic view of how the universe works that they used to cause massive transformations in how we understand and interact with biology. One of the most influential of these scientific interlopers was Max Delbrück who quickly reasoned that, if we were ever going to understand how life works, we would need to start with the simplest organism possible and work our way up. He isolated seven bacteriophages against E. coli B, originally just his lab strain, and named them in a series T1 (previously) through T7. The central idea was that he and his growing number of colleagues* would focus on truly understanding how these phages worked and use that knowledge to generalize to Escherichia coli, then the mouse, and then the elephant and us. An essential component of this was the "Phage Treaty" among researchers in the field, which Delbrück organized in order to limit the number of model phage and hosts so that folks could meaningfully compare results. What came out of their original focus, in many respects encapsulated in Erwin Schrödinger's What is life?, has shed light on so much as to truly redefine our self-understanding, much less medicine
  • The Luria–Delbrück experiment elegantly demonstrated that in bacteria, genetic mutations arise in the absence of selection, rather than being a response to selection, that is in all of life.
  • The Hershey–Chase experiment showed once and for all that nucleic acids were in fact the heritable molecule in not just T2 phage and E. coli, but indeed all of life.
  • Easily the snarkiest, most badass, and likely most important published scientific paper ever, written as an accessible single page, about the double helix structure of DNA. Jim Watson changed majors from ornithology to genetics after reading What is Life and became Luria's graduate student, while Crick was an older former physicist who also claimed inspiration from Schrödinger. The structure of DNA, and its relationship to function that they discovered, is true for all of life.
  • Soon afterwards the adapter hypothesis and central dogma, both of which are (at least simplistically) true for all of life.
  • posted by Blasdelb at 4:53 AM on October 17, 2012 [35 favorites]


    Oh right the program.

    I just played with it a bunch and it is pretty much totally analogous to the 'tardware' found on just about any PC you buy with windows installed. While it is very pretty, actually using it for anything productive would be incredibly difficult due to its generally poor design. Its purpose appears to be to sell DNA oligos. If you want to really sink your teeth into this kind of thing I would recommend using Artemis, free curtesy of the Sanger Institute, for annotation and Amplify, free curtesy of the University of Wisconsin, for manipulating and designing sequences.

    For the moment anyway BioBricks still really are for people who can't handle real cloning.
    posted by Blasdelb at 5:05 AM on October 17, 2012 [4 favorites]


    Also here is another, les recent, talk given by Drew Endy that is focused more towards regular folks and another more detailed talk focused more towards computer folks than biologists
    posted by Blasdelb at 5:39 AM on October 17, 2012 [1 favorite]


    Very interesting, Blasdelb, thanks!
    posted by jeffburdges at 6:23 AM on October 17, 2012 [1 favorite]


    One of the nice parts of having moved to Europe is that I get the time to write science explanations like this on FPPs posted in the middle of the night without them getting buried in the extreme tail ends of threads.
    posted by Blasdelb at 7:30 AM on October 17, 2012


    Hi Blasdelb, my name is Omri Amirav-Drory and I'm the founder of Genome Compiler - I really liked your first post (especially "If there is a God of creation that wen't around designing the genomes of all of the living things on Earth, they are the sloppiest, most frustrating, terrible programmer you could possibly imagine.") but was sorry to read the 2nd post.

    We would like to make our software useful for you - we would like to know what we need to build next. We only raised our money and started to work in the beginning of the year and working hard on more features and enhancements.

    I encourage you and any other person here who's interested in this project to write me at omri@genomecompiler.com I can show you what we're bulding now over google hangout/skype/go to meeting and chat about what you think is important.

    Thanks!

    Omri
    posted by corwinbad at 7:31 AM on October 17, 2012


    Looks complicated. Is anyone working on a python interpreter to run on top of this DNA language?
    posted by mccarty.tim at 7:59 AM on October 17, 2012 [1 favorite]


    Is this something that would be of use for those who have had their DNA tested at genealogical and health DNA companies like 23andMe, Family Tree DNA, or Ancestry.com? I have had my autosmal DNA tested at 23andMe. I believe FTDNA and Ancestry.com both use a different method for their DNA testing.
    posted by jkafka at 8:48 AM on October 17, 2012


    Hi Omri,

    Welcome to MetaFilter!

    After playing with your software I did find a number of specific things that particularly frustrated me.
  • I like the intuitive and pretty library interface, and that iGEM's stuff comes included, but that one can only import sequence files saved in either your own format, .xml, or .gb files but not more accessible text file formats is pretty darn limiting for PhD biologists, much less undergrads or amateurs. My old PI still imports things saved in Microsoft Word 2003 files because that is what they know.
  • Can I really not make folders to organize 'projects' in the library? My Amplify folder has thousands of sequences in it and I haven't even been at this that long.
  • I really like the intuitive and recognisable buttons at the top of the project screen for things like cut and paste, but I can't imagine what I'd use them for. The interface doesn't present nearly enough data at the same time for me to really feel comfortable doing something like primer or vector design and I can't seem to be able to shrink the text size, reduce the spacing between the lines of DNA, or get it to display only the 5' strand to get more to fit. I also really liked the very cute genetic code wheel at the amino acids zoom level but similarly can't imagine what functional purpose it would serve.
  • Can I really not drag and click a selection of DNA with my mouse? Using the arrow keys got old real fast.
  • Can I really not change the project window to scroll left to right rather that up and down?
  • What is up with the shopping cart anyway? Is it a way to buy oligonucleotides? Vectors? If so, where do the Oligos/Vectors come from? Why buy them through you guys?
  • To check it out all I did was try to design a vector insert of a gene from one of my phages for one of the preloaded plasmids but didn't really get very far. I'm really curious if you could explain what kinds of specific tasks you guys intend it for, in proper nomenclature I can understand. I'd also be happy to schedule a skype conference to look at what you guys are working on, and of course reporting back to the thread if you'd like.
    posted by Blasdelb at 8:50 AM on October 17, 2012 [1 favorite]


    My yellow-eyed friend is wondering when there will be an option to import Android projects.
    posted by CynicalKnight at 9:52 AM on October 17, 2012


    If there is a God of creation that wen't around designing the genomes of all of the living things on Earth, they are the sloppiest, most frustrating, terrible programmer you could possibly imagine.

    I'm just a layman, so I may not have the direct experience that would give me this point of view, but if there is a creator, they did actually create life that did end up with you, who is now only beginning to understand their own origins and composition as a form of life at the level of DNA. What have you created that is more impressive than that? What does a well-designed genome look like?

    I'm not a creationist by any means, but I do find the existence of life simply amazing - especially life which has the ability to find understanding of its origins and gain comprehension about its existence in however small a way. I'm sure you must have some respect for biological lifeforms yourself, given your field, so why the dismissive attitude? Does it not also provide you with a sense of wonder and amazement?

    They want to turn biological systems into abstractions that can be manipulated by people who don’t understand the lower parts.

    As a person who works with computers running software that is programmed according to these principles and breaks constantly for poorly-understood reasons, I'm not sure how I feel about that. What happens when the "code" does the biological equivalent of "throwing an exception" or the like?

    Thanks for your contribution, it's really interesting to hear firsthand from people who work in genetics.
    posted by nTeleKy at 11:05 AM on October 17, 2012


    "Is this something that would be of use for those who have had their DNA tested at genealogical and health DNA companies like 23andMe, Family Tree DNA, or Ancestry.com? I have had my autosmal DNA tested at 23andMe. I believe FTDNA and Ancestry.com both use a different method for their DNA testing."

    Nether this nor other annotation software like it would be particularly useful for a layman with information recieved from 23andMe or Ancestry.com. They provide a very specific kind of genotype based on Single Nucleotide Polymorphisms (SNPs) not an intact sequence, which is what this would be useful for. It looks like one of the products that Family Tree DNA offers involves sequencing of a specific region of Mitochondrial DNA, which could be viewed in this software if both Family Tree DNA gives you the actual sequence and you managed to get it into the right format. However, even if you did get the sequence and did manage to get it into this program it would show up as a blank series of A, T, G, and C that, as far as I can tell, you wouldn't really be able to turn into the pretty pictures. To do so you would need features -available elsewhere for free- that would allow you to predict Open Reading Frames (ORFs), ideally would want features -available elsewhere for free- that would allow you to compare those predicted ORFs to confirmed ones found in previously annotated sequences posted to GenBank, and really to get meaningful information even then you would need to have some idea of what to look for. It'd look pretty cool though.

    "My yellow-eyed friend is wondering when there will be an option to import Android projects."

    Wanting broadly accessible and functional software for my community doesn't make me a type of Kiwi penguin, well at least that what I hope you're calling me. Otherwise its the bean right? Tell me its the bean.
    posted by Blasdelb at 11:57 AM on October 17, 2012


    "Rather, threads of execution are intertwined in a rather “higgledy-piggledy” fashion, very much like what would be described as a sloppy, unstructured computer program code with lots of GOTO statements zipping in and out of loops and other constructs."

    Hi Omri:

    As a practicing bioinformaticist, I really dislike the oversimplifications presented in your video. "Compilation" of a genetic code is pasting together strings and writing them to disk, compilation is actually inserting the code into an organism and seeing if the new sequence does what you intended. In that sense, the program is not a compiler but a Gene Construction Kit (name already taken). The hard part is not plopping modules of DNA next to each other, it's verifying that they work physically the way you hope in the context of a messy organism. For instance, lucerifase, used in your oak tree example, does not emit light on its own, it emits light by using luciferol as a substrate , "the biosynthesis of which is not completely understood". One would have to figure out the biosynthesis, get all the genes, transduce them into the plant genome, and make sure they were all expressed in the proper subcellular compartment and cell types. There is a place for modular BioBricks style design but overselling the simplicity of this approach will do you no favors in the long run.

    nTeleKy asks a good question: What happens when the "code" does the biological equivalent of "throwing an exception" or the like?

    The organism dies, or the tissue becomes cancerous, or the plant starts eating the product you were hoping to purify from it. Or the RNA message doesn't even get made because some unknown part of the cellular machinery didn't like one sequence being next to another. Experiments have to be done to verify that every single part of the chain is working if anything goes wrong. Good experiments (like good code) contain enough test cases to determine not only that the whole system is failing, but which pieces are failing and how.
    posted by benzenedream at 12:56 PM on October 17, 2012 [2 favorites]


    This is Omri again (founder of Genome Compiler) - our software wasn't design for people to read their genomes. It was design to democratize the tools of creation - to allow people (from research scientists to DIYbio guys) to design DNA code and create new and useful living things.
    posted by corwinbad at 12:57 PM on October 17, 2012


    For the moment anyway BioBricks still really are for people who can't handle real cloning.


    I saw Endy speak at a conference this summer; the speed with which he was running from BioBricks was astounding. It's an interesting demo project, but it doesn't work for getting real engineering done. It turns out that biological systems interact in complex ways and the idea of a mix-and-match "universal" architecture is rather naive.
    posted by mr_roboto at 12:59 PM on October 17, 2012 [1 favorite]


    The first link mentions Plasmids... DIY BioShock? I for one welcome our new splicer overlords.
    posted by Hairy Lobster at 1:24 PM on October 17, 2012


    "This is Omri again (founder of Genome Compiler) - our software wasn't design for people to read their genomes. It was design to democratize the tools of creation - to allow people (from research scientists to DIYbio guys) to design DNA code and create new and useful living things."

    I only just watched the video that benzene dream mentioned and have to agree with them. It communicates fundamental and serious misunderstandings of the nature and capability of what molecular genetics, much less your program, can accomplish. The luciferase example that benzene dream already mentioned is particularly damning. I have yet to find one of the various tasks that is a part of designing DNA code to create new and useful living things that your program would actually be able to productively contribute to accomplishing. Clicking on abstract representations of open reading frames or gene cassettes and moving them around an abstract representation of the oak genome does not a glowy tree make - photoshop not withstanding. Even if garage hobbiests discovering PCR for the first time manage to elucidate the luciferase synthesis pathway, a problem that has been worked on since the 40s, and those hobbiests somehow have learned the skills to troubleshoot the kinds of things that always go wrong in critters as simple and well understood as E. coli, getting germline plant cells to incorpeate any DNA is fantastically complicated and expensive, even is well studied organisms like corn. Figuring out how to do the same for oaks would be something only half dozen organizations on the planet have the capability to even consider and then dismiss as not even close to worth it.

    It might help your case if you could point to one actual engineering task your program could actually be usefull for - with bonus points for something that neither Artemis nor Amplify (as free examples) do better. If this program has a purely educational or inspirational purpose, you are doing a really poor job of presenting it that way.
    posted by Blasdelb at 1:41 PM on October 17, 2012


    Hi Blasdelb, here is your comments and my response below:


    "Hi Omri,

    Welcome to MetaFilter! [OMRI - Thanks!]

    After playing with your software I did find a number of specific things that particularly frustrated me.

    I like the intuitive and pretty library interface, and that iGEM's stuff comes included, but that one can only import sequence files saved in either your own format, .xml, or .gb files but not more accessible text file formats is pretty darn limiting for PhD biologists, much less undergrads or amateurs. My old PI still imports things saved in Microsoft Word 2003 files because that is what they know.

    [Great feedback - currently we support the file formats that you mentioned and also copy/paste of DNA into new "bricks" - one of the biggest user demands - and something we should be releasing soon is a easy way in import many files in many formats directly to GC]

    Can I really not make folders to organize 'projects' in the library? My Amplify folder has thousands of sequences in it and I haven't even been at this that long.

    [Actually we already have this working in an internal build - should be out no later then 15th Nov - can send you this build for feedback next week]


    I really like the intuitive and recognisable buttons at the top of the project screen for things like cut and paste, but I can't imagine what I'd use them for. The interface doesn't present nearly enough data at the same time for me to really feel comfortable doing something like primer or vector design and I can't seem to be able to shrink the text size, reduce the spacing between the lines of DNA, or get it to display only the 5' strand to get more to fit. I also really liked the very cute genetic code wheel at the amino acids zoom level but similarly can't imagine what functional purpose it would serve.

    [we're changing the whole windows management to a tabbed environment where you can open several abstraction layers at the same time in the same project and two tabbed project one next to the other (more like an IDE) - also a way to minimize the material box to have more real-estate. I would love to show you what we're building to get your feedback]


    Can I really not drag and click a selection of DNA with my mouse? Using the arrow keys got old real fast.

    [we know we know - it's an enhancement we should have before the beta is out]

    Can I really not change the project window to scroll left to right rather that up and down?
    What is up with the shopping cart anyway? Is it a way to buy oligonucleotides? Vectors? If so, where do the Oligos/Vectors come from? Why buy them through you guys?

    [shopping cart allow you to get price quotes from synthesis vendors for your designed construct - from oligo to metabolic pathways - we can deliver anything]

    To check it out all I did was try to design a vector insert of a gene from one of my phages for one of the preloaded plasmids but didn't really get very far. I'm really curious if you could explain what kinds of specific tasks you guys intend it for, in proper nomenclature I can understand. I'd also be happy to schedule a skype conference to look at what you guys are working on, and of course reporting back to the thread if you'd like."

    [please do - will give you our contact info via email - would love if you can sit with our UX guy and help us out]
    posted by corwinbad at 1:46 PM on October 17, 2012


    Hi Blasdelb,, just finished writing my comment and saw your new one.

    The Solve for was a mockup - a vision to the future not what's possible today. I got my Ph.D in structural biology (protein crystallography of membrane proteins - hard core molecular biology/biochemistry/structural biology + 4 years post doc at Stanford School of Medicine. I UNDERSTAND how hard biology is - I did A LOT of wet work. When one speaks to a general audience about vision to the future (I stand by the vision!), one need to explain why they should care and give people hope (which I believe in!) that they should and can do genetic engineering using the new methods to produce things of note.

    I know my biology - I know what's possible. We started building our new software only 9 months ago - we have A TON more to build and improve (undo not working yet!) - please help us out with requirements/feedback - we're at least trying :-)

    posted by corwinbad at 1:53 PM on October 17, 2012


    "When one speaks to a general audience about vision to the future (I stand by the vision!), one need to explain why they should care and give people hope (which I believe in!) that they should and can do genetic engineering using the new methods to produce things of note."

    But the hard truth is that this is only visions of a possible future; trained scientists in well equipped labs cannot currently expect to produce useful things of note along this model much less home tinkerers. A healthy discipline needs critical people with dreams not true believers with delusions.

    "There is a place for modular BioBricks style design but overselling the simplicity of this approach will do you no favors in the long run."

    Omri, if you honestly care about what promise the Synthetic Biology model does hold, I would strongly encourage you to take this advice very seriously. It is wildly understated in that viciously dry way that does not always communicate well to non-native speakers of English. My own discipline violently imploded and was nearly annihilated in the 1930s by commercial entities overselling the modest yet vitally important promise it honestly had (PDF). I've dedicated my short career to that promise and even I will tell you that it was right for it to implode then; true believers have no place near science or medicine.

    "I UNDERSTAND how hard biology is - I did A LOT of wet work."

    We are having the conversation we are because 12 hours ago I found your unique name easy to search for on PubMed and Fulbright Post-Doc scholarships are both no joke and published online.

    "[shopping cart allow you to get price quotes from synthesis vendors for your designed construct - from oligo to metabolic pathways - we can deliver anything]"

    Could you elaborate on this point? What do you mean by buying metabolic pathways? Do you have some commercial source of custom ultramer oligos that is cheaper or larger than these guys? Is there some aspect of existing online oligo pruchasing interfaces you feel is inadequate?

    So far you haven't mentioned in any of your comments how you feel about existing software for annotating and manipulating genomes, could you speak to that here?

    "please help us out with requirements/feedback - we're at least trying :-)"

    I'm an academic, free labor is like the distilled essence of my job description.
    posted by Blasdelb at 3:22 PM on October 17, 2012 [1 favorite]


    "Compilation" of a genetic code is pasting together strings and writing them to disk, compilation is actually inserting the code into an organism and seeing if the new sequence does what you intended.

    No, if you're making an analogy with software, a "compiler" simply works out the sequence you want to insert. Actually making a transformed organism and seeing what your sequence does in there would be "running" and "debugging".

    TGC is closer to a slick IDE for a genome macro-assembler. A compiler has a more complex job: it takes something that humans make sense of and converts it into godawful unmaintainable hacky (but efficient) machine code, and it does it well and reliably enough that it can be used by someone who doesn't really understand the underlying machinery. That's clearly the goal of TGC, but (from what I understand— I'm just a spectator to the whole field) there's a fair amount of pretty basic science to be done before something like that is realistic.
    posted by hattifattener at 8:21 PM on October 17, 2012


    « Older It was pretty grand when The Roller Coaster Capita...  |  What ho, dearest cousins in th... Newer »


    This thread has been archived and is closed to new comments