Join 3,368 readers in helping fund MetaFilter (Hide)

iBOL iWish there were an app for that already
July 14, 2010 8:40 PM   Subscribe

What is dna barcoding?

tl;Edward Norton

Barcode of Life: Global Biodiversity Challenge
Earth is home to an estimated 10 to 100 million species but 250 years of study has formally described fewer than two million of them. The International Barcode of Life (iBOL) project includes biodiversity scientists from 26 countries working to identify every species on the planet using short snippets of DNA. Over the next 5 years, they are hoping to process 5 million specimens and they need your help! Students and youth from around the world, along with local scientists at colleges and universities, will participate in a collaborative social networking educational game in a race to identify and assign unique DNA barcodes to as many species as possible.

Student Sleuths Using DNA Reveal Zoo of 95 Species in NYC Homes -- And New Evidence of Food Fraud
Tan and Cost plan to pursue biology and music respectively at university next fall.

What is DNA barcoding?
In 2003, Paul Hebert, researcher at the University of Guelph in Ontario, Canada, proposed “DNA barcoding” as a way to identify species. Barcoding uses a very short genetic sequence from a standard part of the genome the way a supermarket scanner distinguishes products using the black stripes of the Universal Product Code (UPC). Two items may look very similar to the untrained eye, but in both cases the barcodes are distinct.

Until now, biological specimens were identified using morphological features like the shape, size and color of body parts. In some cases a trained technician could make routine identifications using morphological “keys” (step-by-step instructions of what to look for), but in most cases an experienced professional taxonomist is needed. If a specimen is damaged or is in an immature stage of development, even specialists may be unable to make identifications. Barcoding solves these problems because even non-specialists can obtain barcodes from tiny amounts of tissue. This is not to say that traditional taxonomy has become less important. Rather, DNA barcoding can serve a dual purpose as a new tool in the taxonomists toolbox supplementing their knowledge as well as being an innovative device for non-experts who need to make a quick identification.

The gene region that is being used as the standard barcode for almost all animal groups is a 648 base-pair region in the mitochondrial cytochrome c oxidase 1 gene (“CO1”). COI is proving highly effective in identifying birds, butterflies, fish, flies and many other animal groups. COI is not an effective barcode region in plants because it evolves too slowly, but two gene regions in the chloroplast, matK and rbcL, have been approved as the barcode regions for land plants.

Barcoding projects have four components:

1 The Specimens: Natural history museums, herbaria, zoos, aquaria, frozen tissue collections, seed banks, type culture collections and other repositories of biological materials are treasure troves of identified specimens.
2 The Laboratory Analysis: Barcoding protocols can be followed to obtain DNA barcode sequences from these specimens. The best equipped molecular biology labs can produce a DNA barcode sequence in a few hours. The data are then placed in a database for subsequent analysis.
3 The Database: One of the most important components of the Barcode Initiative is the construction of a public reference library of species identifiers which could be used to assign unknown specimens to known species. There are currently two main barcode databases that fill this role:
a) The International Nucleotide Sequence Database Collaborative is a partnership among GenBank in the U.S., the Nucleotide Sequence Database of the European Molecular Biology Lab in Europe, and the DNA Data Bank of Japan. They have agreed to CBOL's data standards (pdf; 30Kb) for barcode records.
b) Barcode of Life Database (BOLD) was created and is maintained by University of Guelph in Ontario. It offers researchers a way to collect, manage, and analyze DNA barcode data.
4 The Data Analysis: Specimens are identified by finding the closest matching reference record in the database. CBOL's Data Analysis Working Group has created the Barcode of Life Data Portal which offers researchers new and more flexible ways to store, manage, analyze and display their barcode data.

Welcome to ITIS, the Integrated Taxonomic Information System! Here you will find authoritative taxonomic information on plants, animals, fungi, and microbes of North America and the world. We are a partnership of U.S., Canadian, and Mexican agencies (ITIS-North America); other organizations; and taxonomic specialists. ITIS is also a partner of Species 2000 and the Global Biodiversity Information Facility (GBIF). The ITIS and Species 2000 Catalogue of Life (CoL) partnership is proud to provide the taxonomic backbone to the Encyclopedia of Life (EOL).

Species identifications underpin all of biological research. Existing morphologically-based diagnostic approaches are often both cumbersome to use and are effective only for certain life stages. DNA-based systems promise to revolutionize the task of identification by providing reliable, inexpensive and rapid diagnosis of species identity.

Imagine a day when any living thing can be identified accurately and rapidly to the species level using a hand-held device the size of a cellular phone. A day when the biodiversity of an entire nation can be inventoried and monitored, and thereby better protected. When invasive species, disease agents and their vectors, and agricultural pests can all be identified and tracked with ease, thus saving millions of dollars and improving both human health and that of the natural environment. When pollen grains at a crime scene can be linked to those on a suspect’s shoes, the quality of water analyzed in terms of its living inhabitants as well as its chemical constituents, and endangered or dangerous species crossing national borders immediately recognized – not just by highly trained professional taxonomists, but by anyone. A day when bio-prospectors will be able to collect and rapidly identify thousands of species that may yield new lifesaving drugs, and when the plant and animal ingredients in food products can be assessed with certainty even after processing. Imagine a day when every curious mind, from professional biologists to schoolchildren, will have immediate access to the names and biological attributes of any species on the planet.

To be sure, this still rings of science fiction. But thanks to an ambitious effort by a growing consortium of scientists, it is poised to become reality. The method that will enable this advance is "DNA barcoding", an approach that employs a small fragment of DNA, a portion of a single gene, to provide a unique identifier – a "DNA barcode" – for each living species on Earth. Using these DNA barcodes, it will be possible to identify any organism, be it juvenile or adult, male or female, large or small, from only a tiny piece of tissue. This is vastly more efficient than traditional approaches which are often based on the detailed examination of specific body parts and which typically require interpretation by trained experts. In addition, because DNA barcoding quickly distinguishes new species, it will greatly accelerate the rate of their discovery. Given that it has taken 250 years to describe roughly 15% of life’s estimated diversity and that this diversity is now being lost at an alarming rate, the taxonomic revolution incited by DNA barcoding arrives at a critical time.

international Barcode Of Life
iBOL: About
How DNA Barcoding Works and What it Will Do? How does iBOL work?

Massive partnership

The Core Partners of DNA barcoding are:
iBOL, the largest consortium of projects, labs, and networks,
CBOL, the global initiative to promote barcoding,
BOLD, the global workbench for assembly, analysis and curation of barcode data, and use of DNA barcodes. It consists of 3 components (MAS, IDS, and ECS) that each address the needs of various groups in the barcoding community.

GenBank, the public archival repository for barcode data.
These Core Partners work with the wide range of Featured Partners shown below. Together the Core and Featured Partners make up the Barcoding Landscape.
Project Site

Canadian Barcode of Life Network
The Canadian Barcode of Life Network, made up of nearly 50 researchers from across the country, represents the first national network dedicated to large-scale DNA barcoding. The goal of this network is to make important contributions to biodiversity research, and to maintain Canada's place as a leader in the development of DNA barcoding.
Project Site

Catalogue of Life (CoL)
The Catalogue of Life (CoL) is a joint effort between the Species 2000 and IT IS organizations aimed at completing coverage for all 1.75 million known species by 2011.
Project Site

Encyclopedia of Life (EOL)
The Encyclopedia of Life (EOL) is an online reference and database for all 1.9 million species currently known to science, and will stay current by capturing information on newly discovered and formally described species. EOL aims to help all of us better understand life on our planet.
Project Site
EOL Previously More previously (check out EOL’s Photosynth library)

Bee Barcode of Life Initiative (Bee-BOL)
Bee-BOL, the Bee Barcode of Life Initiative, is a global effort to coordinate the assembly of a standardized reference sequence library for all ~20,000 bee species. Bee-BOL is creating a valuable public resource in the form of an electronic database containing DNA barcodes, images, and geospatial coordinates of examined specimens. The database contains linkages to voucher specimens, information on species distributions, nomenclature, authoritative taxonomic information, collateral natural history information and literature citations.

Coral Reef Barcode of Life
The Coral Reef Barcode of Life campaign is a detailed barcode study of fishes at one site in the Great Barrier Reef to generate a barcode library that will aid taxonomic work by clarifying species boundaries and by revealing cryptic taxa.

European Consortium for the Barcode of Life (ECBOL)
ECBOL is an information and coordination hub on DNA barcoding in Europe organized within EDIT, the European Institute of Taxonomy and maintained by CBS, the Centraalbureau voor Schimmelcultures in Utrecth. The ECBOL initiative (Calibrating European Biodiversity using DNA Barcodes) is a network of European researchers and is seeking to obtain funding fro the coordination and maintenance of a Network of European Leading Labs.

Fish Barcode of Life Campaign (FISH-BOL)
FISH-BOL, the Fish Barcode of Life campaign, is collecting barcodes from at least five specimens representing the 30,000+ species of marine, freshwater and estuarine fish of the world. Like ABBI, FISH-BOL has a central Steering Committee and Regional Working Groups.

HealthBOL coordinates initiatives to barcode vectors, pathogens, and parasites for the betterment of human health around the world.

Lepidoptera Barcode of Life
The aim of the Lepidoptera Barcode of Life campaign is to build a COI barcode library for all butterfly and moth species. This library will permit the rapid, reliable identification of Lepidoptera at any stage of their development (egg, caterpillar, pupa or adult) and will facilitate the discovery and description of new species. Such barcode libraries, in combination with the ones built for other groups of terrestrial animals, will make possible the detailed biodiversity maps required to guide the positioning of protected areas and to monitor the status of terrestrial life.

Mammalia Barcode of Life Campaign
The Mammalia Barcode of Life campaign is a part of the larger effort encompassing all vertebrates, and aims to build a comprehensive reference library of DNA barcodes for the global mammal fauna. The campaign seeks to assemble a broad global coalition of leading researchers, museums, and other institutions with interest in mammal taxonomy and biodiversity

Marine Barcode of Life (MarBOL)
MarBOL is an international campaign to obtain at least 50,000 barcode records of marine species by October 2010. MarBOL is led by an international Steering Committee and an affiliated project of the Census of Marine Life (CoML).

Mosquito Barcode Initiative (MBI)
MBI, the Mosquito Barcode Initiative is another "demonstration project" aimed at producing a global operational system for identifying mosquitoes in two years. MBI plans to barcode at least five specimens from 80% of the 3200 known mosquito species. Disease-bearing species and their closest relatives will be the highest priority.

Polar Barcode of Life (PolarBOL)
The Polar Barcode of Life campaign coordinates barcoding efforts in ongoing bioinventory projects in Arctic and Antarctic marine, freshwater and terrestrial ecosystems.

Quarantine Barcode of Life (QBOL)
QBOL is a project financed by the 7th Framework Program of the European Union that makes collections harboring plantpathogenic quarantine organisms available. Informative genes from selected species on the EU Directive and EPPO lists are DNA barcoded from vouchered specimens. In the next 3 year the sequences, together with taxonomic features, will be included in an internet-based database system.

Shark Barcode of Life (SharkBOL)
The Shark Barcode of Life project aims to barcode the 1,000 marine and 100 freshwater shark species.

Sponge Barcoding Project (SpongeBOL)
The Sponge Barcoding Project is the first global barcoding project on any diploblast taxon and covers the complete taxonomic range of Porifera.

Tephritid Barcode Initiative (TBI)
TBI, the Tephritid Barcode Initiative is a two-year "demonstration project" that will create an operational system for identifying fruit flies around the world. TBI will barcode at least five representatives of all tephritid fruit flies that are either (1) agricultural pests, (2) beneficial species used for biological control of other pests, (3) closely related to pests or beneficial species; and (4) representative species from other families of tephritids. TBI plans to obtain barcodes from approximately 2000 species of the estimated 4500 known tephritid species.

Trichoptera Barcode of Life (TrichopteraBOL)
Trichoptera Barcode of Life is a long-term project to barcode the world’s approximately 13,000 species of caddisflies.

The All Birds Barcoding Initiative (ABBI), launched in September 2005, aims to collect standardized genetic data in the form of DNA barcodes from the approximately 10,000 known species of world birds. Despite several hundred years of careful study, genetic surveys including those with DNA barcoding suggest there are hundreds of as yet undescribed avian species. ABBI aims to help speed discovery of new species, provide a practical tool for specimen identification, and open new avenues for scientific investigation.

National GEO focus on Paul Hebert's 'big idea'; barcoding life.
If you turn on a light at night in the mountains of Papua New Guinea, says Paul Hebert, you will collect some 2,000 species of moth. Moving up the mountain a bit will net you a different but equally daunting crowd. As a young postdoc in the 1970s, Hebert, now an evolutionary biologist at the University of Guelph in Ontario, spent five years trying to make sense of that fluttering confusion, before finally deciding it was beyond his or any human’s capacity. For two decades after that he retreated to water fleas, of which there are only 200 species. Then in 2003 he did something new. In a paper that year he began by describing the diversity of life as a “harsh burden” for biologists, and proceeded to suggest some relief: Every species on Earth could be assigned a simple DNA bar code, Hebert wrote, so it would be easy to tell them apart.

The bar code Hebert suggested is part of a gene called CO1, which helps produce the energy-carrying molecule ATP. CO1 is so essential that every multicellular organism has it. But there is enough variation in its sequence—each of the 600-odd spots in the bar code region can be filled by any of four different DNA bases—that two species rarely have the exact same one. Such differences in a gene are readily scanned by machine even when the animals themselves might confound an expert; Hebert’s group is now sequencing a thousand specimens a day. They’ve bar coded nearly 40,000 species of moth and butterfly already. The technique has commercial as well as scientific promise. Mislabeling of fish on menus is rampant, it turns out.

Welcome to the Biodiversity Institute of Ontario, here Paul Hebert D.N. is Director of the Biodiversity Institute of Ontario, Canada Research Chair, Scientific Director
Biodiversity measures the variation of life shaped through ecology and evolution from genes to species and ecosystems. Genetic variation plays a critical role in the ability of individuals and species to respond and adapt to environmental change while the diversity of species within and between ecosystems provides significant advantages to ecosystem function and resilience. One of the ironies of biological research is that after more than 250 years of dedicated biological science, the total number of species within any country or region remains unknown. While we often have a good idea of the identities and ecological roles of the larger, more charismatic animals (birds, mammals) the truth of the matter is that most of life is small (insects, bacteria, fungi) and currently undescribed. Shedding light onto these lesser known groups is important because all of the larger groups, including ourselves, depend on these smaller organisms for some part of their daily natural history. In order to protect and understand the diversity of life in Ontario, we must be able to know the species and ecosystems upon which humans, and our industry and lifestyle, depend.

Google Tech Talks:

Current Issues in Computational Biology and Bioinformatics
Gary Bader is Assistant Professor at the Terrence Donnelly Centre for Cellular and Biomolecular Research (CCBR) at the University of Toronto.

Jonathan Rosenberg introduces, Drs. Paul Hebert and Dan Janzen discuss their transformative (and very Googley) International Barcode of Life project (
iBOL's goal is to capture, using a handheld device, the unique "DNA barcode" of each and every species on earth, and organize that information to be accessible and useful for everyone (sound familiar?). A DNA barcode is a gene sequence that uniquely identifies any species, and iBOL has already barcoded 35,000 of them. There are approximately 10M species on the planet (half of which have yet to be discovered), so there's a long way to go, but the components for success are in place.

During my recent family vacation to Costa Rica I hiked the rain forest, and by the end of the trip could easily identify a toucan, eyelash viper, and three types of monkeys (howling, spider, and Rosenberg offspring). Pretty impressive, right? Then Dr. Janzen showed me a photo of that same rain forest and told me that there were approximately 400 species of animals and plants in that picture, and not a toucan or monkey among them. So it turns out that I'm just as bio-illiterate as everyone else, but Google can do something about this. When we talk about organizing all the world's information, a blueprint of the world's natural biodiversity should be part of it.

Is IT ready for the Dreaded DNA Data Deluge? -Dr. Andras Pellionisz

In 18 months full human genome sequences will be available under $100 - and in minutes. The $5,000 full human genome was announced to come in 9 months. Is "Big IT" ready for the avalanche of data, to be obtained and processed e.g. while the patient is still on the operating table, to be diagnosed, and how the genomics glitch, that caused a benign or malign tumor, could be compensated for?

Algorithmic approaches are needed to better understand genome regulation, even for the simple reason to deploy most effective data retrieval, data storage and computational means, via both parallel hardware and software, but more importantly for opening entirely new perspectives.

In the 100+ year old Genomics, for over half a Century had us to resign to the fatalistic gloom that we are stuck with any glitches in our inherited genome. Is it true that genomic glitches doom one to "incurable" hereditary diseases?

No longer. Genomics now considers the DNA-RNA-Protein chain not as a thermodynamically closed system, where entropy increases, but as an open system that can be interfered with. There is theoretically sound hope that you are not stuck with your genomic glitches.

After half a Century of sticking to two mistaken axioms of Genomics, the paradigm of recursive genome function must quickly make up for lost time for those (potentially) inflicted with formerly "incurable" diseases. "The Genome baby is left on the doorsteps of Information Technology".

Doctors sent those inflicted with fleece for "debugging". Debugging genome information (by Genome Computers) would be much harder without understanding the algorithms that our natural genome computing operates with.

Third International Barcode of Life Conference, Mexico City:
dnabarcodes Past Meetings and Conferences...
Third International Barcode of Life Conference
Conference program overview

To review all the files in one place, try downloading a copy of the hyperlinked Table of Contents in PDF format (this is really the best way to see how many talks, papers, and videos [Many Videos] are available from this conference [PDF]).

Poster Presentations
Barcode of life publications

Session 1 (Tuesday, 10-November) - Welcome and Introduction
Session 2 (Tuesday, 10-November) - Lessons learned from 2004-2009
Parallel Technical Session A (Tuesday, 10-November) - Plant Working Group
Parallel Technical Session A (Tuesday, 10-November) - Pathogens, Disease Vectors & Parasites
Parallel Technical Session A (Tuesday, 10-November) - FISH-BOL
Parallel Technical Session A (Tuesday, 10-November) - Barcoding Species for Quarantine/Plant Protection
Parallel Technical Session A (Tuesday, 10-November) - Marine Barcoding
Parallel Technical Session A (Tuesday, 10-November) - All Bird Barcoding initiative & Vertebrates
Parallel Technical Session B (Tuesday, 10-November) - Plant Working Group
Parallel Technical Session B (Tuesday, 10-November) - Pathogens, Disease Vectors & Parasites
Parallel Technical Session B (Tuesday, 10-November) - FISH-BOL
Parallel Technical Session B (Tuesday, 10-November) - Barcoding Species for Quarantine/Plant Protection
Parallel Technical Session B (Tuesday, 10-November) - Marine Barcoding
Parallel Technical Session B (Tuesday, 10-November) - All Bird Barcoding Initiative & Vertebrates
Session 3 (Wednesday, 11-November) - Case Studies: Impact of barcode data in research areas beyond taxonomy
Session 4 (Wednesday, 11-November) - Informatics and Data Analysis
Parallel Technical Session C (Wednesday, 11-November) - Plant Working Group
Parallel Technical Session C (Wednesday, 11-November) - Insects/Terrestrial Arthropods: Utility & Alternative Approaches
Parallel Technical Session C (Wednesday, 11-November) - Fish-BOL
Parallel Technical Session C (Wednesday, 11-November) - Large-Scale Initiatives
Parallel Technical Session C (Wednesday, 11-November) - Fungi, Algae, Protists & New Groups
Parallel Technical Session C (Wednesday, 11-November) - Data Analysis Working Group (DAWG)
Parallel Technical Session C (Wednesday, 11-November) - BeeBOL Symposium
Parallel Technical Session D (Wednesday, 11-November) - Barcoding the Trees of Africa
Parallel Technical Session D (Wednesday, 11-November) - Insects/Terrestrial Arthropods: Biodiversity Studies
Parallel Technical Session D (Wednesday, 11-November) - FISH-BOL
Parallel Technical Session D (Wednesday, 11-November) - Barcoding Databases, Protocols, and Education
Parallel Technical Session D (Wednesday, 11-November) - Fungi, Algae, Protists & New Groups
Parallel Technical Session D (Wednesday, 11-November) - Data Analysis Working Group (DAWG)
Parallel Technical Session D (Wednesday, 11-November) - BeeBOL Symposium
Session 5 (Thursday, 12-November) - Case Studies of Applications

Session 6 (Thursday, 12-November) - Barcoding and Next Generation Sequencing Technologies
Parallel Technical Session E (Thursday, 12-November) - Meso-American Symposium
Parallel Technical Session F (Thursday, 12-November) - Meso-American Symposium
Barcoding bushmeat: molecular identification of Central African and South American harvested vertebrates

Pre-Conference Workshop, Botanical Garden Auditorium:Dr. Paul Hebert
Professor Hebert is Canada Research Chair in Molecular Biodiversity at the University of Guelph and directs the Biodiversity Institute of Ontario. Previously, he was Chair of the Department of Zoology, Board Chair at the Huntsman Marine Science Centre at Guelph, and Director of the Great Lakes Institute at the University of Windsor. Professor Hebert is best known for founding the concept of DNA barcoding, and has published more than 270 papers employing molecules to probe biological diversity. Over the past triennium, he has raised more than $30M to construct the world's first barcode 'factory' and the informatics platform needed to support the barcode registration of all multi-cellular life. Together with a few colleagues, Professor Hebert is now leading efforts to establish the $150M International Barcode of Life Project that will barcode 500K species within 5 years. He is a Fellow of the Royal Society of Canada and has received several national and international scientific awards. Professor Hebert completed a BSc at Queen's University, a PhD in genetics at Cambridge University and postdoctoral fellowships at the University of Sydney and the Natural History Museum (London).

Garden Auditorium: Junko Shimura
Pre-Conference Workshop, Botanical Garden Auditorium: Planning and Funding DNA Barcoding Projects
Global Taxonomy Initiative and Implementation of the CBD

The Convention on Biological Diversity (CBD) entered into force on 29 December 1993. It has 3 main objectives:

1. The conservation of biological diversity
2. The sustainable use of the components of biological diversity
3. The fair and equitable sharing of the benefits arising out of the utilization of genetic resources

These countries are involved in the CBD
(very interesting) CBD Fact Sheets;

But tougher regulation could come with a cost[PDF], warns David Schindel, an invertebrate
palaeontologist and executive secretary of the Consortium for the Barcode of Life, an international initiative to identify species using short genetic sequences, based at the Smithsonian Institution in Washington DC. “We are very concerned that it will become more restrictive,” he says. In some cases, it can already take at least two years and reams of paperwork to agree the terms on which research can be conducted, specimens exported
and profits shared. “You could go through a field season collecting specimens and then the
government says they are going to hold on to them because you don’t have the right permission,” he says.
posted by infinite intimation (34 comments total) 13 users marked this as a favorite

March/April 2007 issue CANADIAN GEOGRAPHIC [PDF]...

The broader applications of DNA barcoding, however, are undeniable. The Feather Identification Lab at the Smithsonian Institution, working with the Federal Aviation Administration (FAA) and the U.S. Air Force, is using the tool to identify bird species that collide with aircraft, a phenomenon the FAA estimates costs about US$345 million every year. But “snarge,” a Feather Lab term for the goop wiped from an aircraft following a bird strike, is often anatomically unrecognizable. Being able to identify a species allows airfields to implement appropriate habitat management programs, as well as warn crews of particular bird dangers and help engineers design better airplanes.

(Bonus Snarge) and Recently.

in battle on birds, air force deploys a secret weapon
identifying the bird when not much bird is left
audubon snarge
CSI for Birds: Scientists Use Forensic Techniques to Improve Airport Safety

Taxonomy isn't black and white(PDF), By Nick Atkinson

Why DNA barcoding works

Rob DeSalle [PDF]
Curator of Entomology
Division of Invertebrate Zoology, American Museum of Natural History,
Central Park West @ 79th Street, New York, NY 10024, USA
Recent excitement over the development of an initiative to generate DNA sequences for all named species on the planet has generated two major areas of contention as to how this ‘DNA barcoding’ initiative should proceed. It is critical that these two issues are clarified and resolved, before the use of DNA as a tool for taxonomy and species delimitation can be universalized. The first issue concerns how DNA data are to be used in the context of this initiative; this is the DNA barcode reader problem (or barcoder problem). Currently, many of the published studies under this initiative have used tree building methods and more precisely distance approaches to the construction of the trees that are used to place certain DNA sequences into a taxonomic context. The second problem involves the reaction of the taxonomic community to the directives of the ‘DNA barcoding’ initiative.
This issue is extremely important in that the classical taxonomic approach and the DNA approach will need to be reconciled in order for the ‘DNA barcoding’ initiative to proceed with any kind of community acceptance. In fact, we feel that DNA barcoding is a misnomer. Our preference is for the title of the London meetings—Barcoding Life. In this paper we discuss these two concerns generated around the DNA barcoding initiative and attempt to present a phylogenetic systematic framework for an improved barcoder as well as a taxonomic framework for interweaving classical taxonomy with the goals of ‘DNA barcoding’.
An excellent review of the involved issues, topics, considerations, applications, and understandings

Coupling non-destructive DNA extraction and voucher retrieval for small soft-bodied Arthropods in a high-throughput context: the example of Collembola

Using DNA barcodes to connect adults and early life stages of marine fishes from the Yucatan Peninsula, Mexico: potential in fisheries management

Morphological, ecological, reproductive and molecular evidence for Leptodiaptomus garciai (Osorio-Tafall 1942) as a valid endemic species

A new cryptic species of Leberis Smirnov, 1989 (Crustacea, Cladocera, Chydoridae) from the Mexican semi-desert region, highlighted by DNA barcoding

Barcoding endangered Sea Turtles

15 New Bird Species-SciAm

Hundreds of new sea creatures found on Australian reefs

The Walter Reed Biosystematics Unit (WRBU) is a unique national resource. Its mission is to conduct systematics research on medically important arthropods and to maintain the U.S. mosquito collection.

Freaky New Bats Found by DNA Barcoding

Teenage DNA sleuths expose New York fish fraud

Mercury in Tuna Sushi Higher at Restaurants than Groceries

DNA 'Barcodes' Surface Fishy Impostors on Menus

Previously Mefi: Apples

Third International Barcode of Life Conference, Session 2A: Karen James-

This has very real uses.

Next-gen barcode application

A Canadian study shows even if you don’t swallow the worm at the bottom of a bottle of the Mexican liquor mescal, you may not have avoided the worm’s DNA.

In these studies, COI sequences accurately identify bdelloid rotifer species, further demonstrating the robustness of DNA barcoding.

-“this refutes the idea that sex is necessary for diversification into evolutionary species.”

Early in Michael Crichton’s 1990 novel Jurassic Park, Dr. Henry Wu, chief scientist at Jurassic Park Research Insitute, showing visitors around his facility, displays ”the actual structure of a small fragment of dinosaur DNA“. Astute readers pointed out Dr. Wu’s dinsosaur genetic resuscitation project was unlikely to succeed, as the sequence in Crichton’s novel was a fragment of the bacterial plasmid pBR322.

DNA Barcoding of Bushmeat - Cameroon

Life Behind Bars, UCSanDiego

Rockefeller Barcode Blog
posted by infinite intimation at 8:42 PM on July 14, 2010

In terms of seeing this all come to fruition, one area to watch may be nanopores

Harvard University and Oxford Nanopore Technologies Announce Licence Agreement to
Advance Nanopore DNA Sequencing and other Applications

Oxford Nanopore: BASE DNA Sequencing with voiceover

Explaining DNA analysis on a chip

IBM "smarter Planet" projects, dna Transistor

IBM dna transistor project page:

IBM DNA Transistor
A DNA molecule consists of millions of different nucleotides that make up the human genome; the blueprint of living organisms.

Next, single strands of DNA molecules that are floating above the microchip are threaded or pulled through the nanopore by an electrical field, which begins the process of reading and sequencing the molecules.

The DNA Transistor device consists of alternating nanometer-sized layers of metal and dielectric. Discrete charges located along the backbone of a DNA molecule get trapped by electrical fields inside the nanopore. By trapping the DNA molecule, scientists will have ample time to measure the molecule structure.

By cyclically turning on and off these gate voltages, researchers have shown theoretically and computationally, and expect to be able prove experimentally, the plausibility of moving DNA through the nanopore at a rate of one nucleotide per cycle a rate that IBM scientists believe would make DNA readable.

Low-cost, yet efficient analysis of DNA data promises to help facilitate the discovery of new healthcare products, and help determine an individual's predisposition to a particular disease or condition.

IBM Archives
WATSON LABS, IBM (Watson Previously)

The IBM Thomas J. Watson Research Center is the headquarters for IBM Research
-- the largest industrial research organization in the world, with eight labs in six countries. Established in 1961, the Watson Research Center is located in Westchester County, New York and Cambridge, Massachusetts and spans three sites and four buildings -- the main laboratory in Yorktown Heights, two buildings in Hawthorne, and one building in Cambridge.

The research focuses primarily on IT hardware (ranging from exploratory work in the physical sciences to semiconductors and systems technology); software (including areas as diverse as security, programming, mathematics and speech technologies); and services, with a focus on applying them to transform businesses in a wide range of industries.

Thomas J. Watson Sr. (1874 - 1956)
Thomas J. Watson Jr. (1914 - 1993)

Watson Facility History:

On February 6, 1945, Nicholas Murray Butler, president of Columbia University, and Thomas J. Watson Sr., president of IBM, jointly announced the formation of the Watson Scientific Computing Laboratory at Columbia University.

The Computational Genomics Research Group is focussed on addressing genomics related questions through mathematical & statistical modeling, combinatorics and algorithmics.

Cacao Genome
Cacao production is important! Not only does it provide a livelihood for over 6.5 million farmers in Africa, South America and Asia, but it is the basic ingredient in the worlds favorite confection, chocolate. Historically, cocoa production has been plagued by serious global losses from pests and diseases. Brazil, for instance, was the world's second largest cocoa producer during the 1980's, producing over 400,000 tons/year. After a fungal pathogen, M. perniciosa, infected almost all of its cacao growing regions it now produces less than 100,000 tons/year. So how can we best secure cocoa bean production?

The Genographic Project is seeking to chart new knowledge about the migratory history of the human species by using sophisticated laboratory and computer analysis of DNA contributed by hundreds of thousands of people from around the world. In this unprecedented and of real-time research effort, the Genographic Project is closing the gaps of what science knows today about humankind's ancient migration stories.
The Genographic Project: Our history within

IBM Genographic Project

Previously Genographic Project; Pre-Previously:

IBM Research Computational Biology Center

Ajay Royyuru heads the Computational Biology Center at IBM Research, with research groups engaged in various projects including bioinformatics, protein science, functional genomics, systems biology, and computational neuroscience. Ajay joined IBM Research in 1998, initiating research in structural biology.

Ajay leads the IBM Research team working with National Geographic Society on the Genographic Project.
posted by infinite intimation at 8:45 PM on July 14, 2010 [1 favorite]

Some worry to an intense degree about cheap DNA sequencing, but a quietly passed piece of legislation might cause others to consider otherwise;

Title: Genetic Information Nondiscrimination Act of 2008

Signing the ‘Genetic Information Nondiscrimination Act of 2008’ into law

Genetic Discrimination Fact Sheet

Understanding Genetic Discrimination

Time still thinks we are all fools.

Several nifty interactive 'Trees of Life':
Two (this one is particularly well done with the interactivity, and can be saved to explore offline)
posted by infinite intimation at 8:46 PM on July 14, 2010

At the center of DNA barcoding’s rising stock lies the promise of many practical applications. Customs officials could easily identify trafficked, endangered, invasive, or other important species. Animal and plant ingredients in food products could be pinpointed. Biodiversity could be catalogued over large areas to readily identify diversity hotspots for conservation.

Not everyone shares the vision. A growing opposition to DNA barcoding has emerged from within the ranks of evolutionary biology. And many of the sharp volleys in the debate are sounding from labs here at UC Berkeley.

Asked whether species can be said to have barcodes, Berkeley integrative biology professor, Brent Mishler was unequivocal. “No.” Mishler, an evolutionary biologist who studies mosses and serves as director of the University and Jepson Herbaria on campus, argues that DNA barcoding borrows from a dangerously false view of biological diversity. Barcoding “makes some sense for some systems, like screws in a warehouse,” he says, “but not for an evolving system.”

Products in a warehouse maintain a uniform level of difference from one another that can accurately be tracked by a barcode. But species are not discrete and unchanging units. They are aggregations of individual organisms that collectively constitute the present-day representatives of an evolving lineage. The conditions that drive the evolution of these lineages act on organisms, not on singular characteristics like fur color or particular genes. A single trait, like a gene, might distinguish species within a group of organisms with a common evolutionary history. But any single characteristic is insufficient a priori and on its own to diagnose and classify the whole of biotic diversity. “Contrary to their posturing as cutting-edge,” Mishler argues, “DNA barcoders are returning to an ancient, typological, single-character approach, and are maintaining a pre-Darwinian view of species.”

In fact, studies have shown that the family trees of particular genes do not always correspond to the family trees of the organisms that harbor those genes. And empirical studies of barcoding projects have used the same data as barcode proponents to reveal the absence of a one-to-one relationship between mitochondrial barcode variation and species variation. For Daniel Rubinoff, associate professor of Entomology at the University of Hawaii and former UC Berkeley PhD student and postdoc, the reasons for these shortcomings come down to a simple fact. “A dynamic process like speciation can’t be modeled by a procedure as rigid as barcoding.”
posted by barnacles at 8:47 PM on July 14, 2010 [4 favorites]

I remember the first time I took speed to finish a paper too.
(Jesus on toast man, that's an exhaustive post)
posted by now i'm piste at 8:48 PM on July 14, 2010 [16 favorites]

Thanks for all this information! My best friend is using DNA barcoding in her PhD work, and is really excited about it. Now I have a lot of resources to figure out what the heck she's talking about.
posted by lriG rorriM at 8:53 PM on July 14, 2010


posted by Skygazer at 8:55 PM on July 14, 2010 [4 favorites]

posted by Iridic at 9:00 PM on July 14, 2010

I'll be coming back in a year to hear what people who read through this have to say.
posted by resiny at 9:01 PM on July 14, 2010 [2 favorites]

I misread your username as Infinite Information at first.

Related: Holy Crap.

Looking forward to poking through all of this and attempting to absorb even a fraction of it.
posted by Stunt at 9:03 PM on July 14, 2010

Ah. Sorry. Read that as "infinite intimidation."
posted by Iridic at 9:06 PM on July 14, 2010

I'd have a lot more faith in computational biology and medicine if I didn't find myself have to fight to get people to believe that Le Chatelier's Principle still applied to biological systems.

I kind of agree with everything in the quote barnacles posted, too, but my real issues is, once you're in the neighborhood, if we're really getting close to these super cheap whole genome sequences, why not do the whole thing.
posted by Kid Charlemagne at 9:07 PM on July 14, 2010

Forward slash.
posted by Deathalicious at 9:21 PM on July 14, 2010 [2 favorites]

...once you're in the neighborhood, if we're really getting close to these super cheap whole genome sequences, why not do the whole thing.
Yeah, endeavors like Genome 10K (a plan to sequence over 10,000 vertebrate genomes in five years) will largely take the place of barcoding, I predict.
posted by bergeycm at 9:29 PM on July 14, 2010

Do I get college credit for reading this post?
posted by Edgewise at 9:44 PM on July 14, 2010 [3 favorites]

I saw this in the RSS feed, it exceeded the typical length of a WEEK'S feed. I thought it was a program error that dumped some other site's database.

Quantity does not equal quality. This is not a quality post.
posted by charlie don't surf at 9:57 PM on July 14, 2010 [5 favorites]

bergeycm: "Yeah, endeavors like Genome 10K (a plan to sequence over 10,000 vertebrate genomes in five years) will largely take the place of barcoding, I predict."

I think the purpose here is to improve citations. For example, for a small programming project, I decided to look up a few original citations on dotplot visualization, and lets just say... the allowed citation style of the European Journal in Biochemistry a lot to the imagination. For example, they use "tuna fish" as a species. They reference an 1969 "Atlas of Protein Sequences and Biological Structures", but I'm not about to hunt that crap down. I've pulled modern sequences for this and the comparisons mostly match up.

The sequences are short enough that had it been published today, I suppose they could have put in a few QR codes (or whatever float biologist's) on a page. But attaching multiple gigabyte genome for every species cited is not feasible today, and even assuming exponential growth in technology, probably won't be feasible before I die.
posted by pwnguin at 10:37 PM on July 14, 2010

Oh sorry, intended to provide a citation. Gibbs & MyIntyre, "The diagram, a method for comparing sequences. Its use with amino acids and nucleotide sequences."
posted by pwnguin at 10:39 PM on July 14, 2010

Huh, I had no idea they hadn't done something like this already. I guess the sequencing costs would have been to high, since the costs are dropping precipitously.
posted by delmoi at 10:41 PM on July 14, 2010

Sequencing costs have been plummeting for years, especially once you own the equipment. Older equipment works just fine, as long as you can pay techs to use them, but the costs of reagents have gone down due to competition. The computation side of things have been getting better, too (better algorithms, faster cheaper consumer-level computers to crunch and match the sequences).

I'm not entirely convinced that this (DNA barcoding) is a benefit-effective way of spending science dollars, especially if this "sexy" "new thing" is taking funding away from more traditional ecologists who look towards a more systems approach which can generate data with which to base strategies on managing said ecosystems. This seems more like just cataloguing everything - almost google like in scope - which is admirable (especially to future genestorians), but otoh, if the barcoding people end up working with the ecology people, good things could happpen.
posted by porpoise at 11:06 PM on July 14, 2010

They reference an 1969 "Atlas of Protein Sequences and Biological Structures", but I'm not about to hunt that crap down.

You're kidding, right? "That crap" is one of the seminal works in bioinformatics, the direct predecessor of the electronic biomolecule databases we use today. Anyway, here's your protein.
posted by grouse at 11:07 PM on July 14, 2010

An expensive specialty "sheep's milk" cheese made in fact from cow's milk

Great. I'm allergic to cow's milk, and buy goat and sheep cheese for that reason. Now I have to worry about yet another source of allergic reaction. At least there's a tool for enforcemenbt now, assuming the FDA bothers to use it.
posted by Jimmy Havok at 11:20 PM on July 14, 2010 [1 favorite]

Way too much cut and paste in this post - you could have just left one word links for the Fish-Sponge-Bird-Shark-Unicorn projects.

A short sentence or two about the major challenges with relevant hyperlinks would have had just as much information content for the curious without tldr'ing everyone else.

For a good read on the history of sequence databases, check out the Jan 2000 issue of Bioinformatics, which was devoted to historical articles by pioneers in the field (link).
posted by benzenedream at 11:33 PM on July 14, 2010 [3 favorites]

My best friend is using DNA barcoding in her PhD work, and is really excited about it. Now I have a lot of resources to figure out what the heck she's talking about.

She might not be doing quite the same thing. "DNA barcoding" can also mean inserting your own, artificial barcode sequences when you're generating a large number of mutants or doing a variety of other things that involve keeping track of a large number of molecules/samples/genes, some of which may have been generated partly randomly (which means you might not even know their sequences.) (An example - a paper where they're double-checking to make sure that this system works for keeping track of yeast mutants.)

[A bunch of stuff about nanopores and IBM's venture into DNA sequencing.]

The "DNA Transistor" thing is interesting, but I'm not sure if it's really more promising than the various other next-generation sequencing methods. The problems that plague them - above all, some issues with accuracy on certain sequences and short read lengths, which mean you have to "read" the sequence many times and piece together these short (30-150 base) reads into a million to billion base genome - seem like they might plague IBM's method too, since the heart of IBM's project is a better way to "thread" the DNA through a nanopore for sequencing, rather than a revolution in detection methods. A lot of the other methods have a sequencing rate limited by sensors/cameras and enzymes (which are, currently, way faster and more awesome than any nanodevices we can build), and are done in huge arrays. Honestly, we're already pretty close to being able to use the next-gen sequencing methods to get genomes (or genome-sized sets of DNA-related information) in a ridiculously short amount of time - weeks for larger, eukaryotic genomes, days for smaller prokaryotic ones. Real-time and in many cases single-molecule sequencing is already possible with several of the next-gen methods. Some of the newer methods (heirs to old school chemical sequencing, which some of us still use for some assays, and Sanger sequencing, which is your basic modern sequencing technology):

-Roche/454 sequencing is currently at 100Mbases/workday, or so; no need to thread DNA through pores, since this method uses a combination of PCR and pyrosequencing (it senses when a dNTP is added to a template DNA fragment), and individual DNA fragments are attached to nanobeads.
-ABI has a similar method (SOLiD) that uses fluorescent dNTPs instead of pyrosequencing. It also has some extra slightly hard-to-explain aspects that try to lower the potential error by binding two bases, not one.
-Illumina/Solexa gets you 1300Mbases over several days - it's got shorter read lengths, but while it, like the Roche/454 method, uses PCR to sequence, the dNTPs have different fluorescent labels. Single DNA fragments are used to seed "polony" spots on a plate, where the fragment is amplified, which makes the sequence easier to read (with our current methods) but takes more time than a purely single-molecule approach.
-Current single-molecule methods - mostly the Helicos and Pacific Biosciences "next next generation" - don't require you to make lots of copies of your fragments before you sequence, and they use fluorescent nucleotides too; again, short read lengths, but 5E7 bases/h, or thereabouts.

To give you an estimate about what this bases per hour stuff means, this chart shows you relative genome sizes; things are measured in "bases" - the A/T/C/G bits of DNA - and a megabase is a million bases, with genomes usually ranging from millions to billions of bases. And, of course, all the sequencing rate numbers I have are probably already out of date, and there are new techniques being pioneered all the time. Which is to say, if you want genomic data, (or other large-scale DNA-related data), you've got it, and while IBM's method is nifty, it's one of a pack of incredibly cool methods - most of which use enzymes to do a lot of the work, since nature's still better than we are at the nanotech game.

(A bunch of the review articles I linked to are behind paywalls; sorry, Nature is annoying.)
posted by ubersturm at 11:35 PM on July 14, 2010 [2 favorites]

Gene trees do not necessarily equal species trees.

If there is one thing to keep in mind when thinking about the applications and benefits of DNA barcoding, that would be it. DNA barcoding by its very definition uses a single gene (the mitochondrial gene Cytochrome Oxidase I). It is very often the case that the evolution of individual genes does not reflect the evolutionary lineage of the species (for example, due to incomplete lineage sorting). Consequently, the data obtained through barcoding may not actually reflect the relationships amongst actual species.

(Though, hell, if you want to talk about what constitutes a species, we could go on for EVER.)

This is how it goes. Barcoding has its devotees and its detractors. At best, it is an incredibly useful tool for getting a first look at what the relationships amongst species might be. That being said, the resolution of evolutionary trees derived from barcoding (ie. the timescale at which COI evolves) is focused upon more recent events. Older evolutionary events that occur deeper in the phylogeny (i.e. evolutionary tree) can be missed completely by DNA barcoding.

SO, think of DNA barcoding as just one component of an evolutionary geneticist's toolbox. I'm an evolutionary geneticist by training, but I'm currently obtaining training in traditional taxonomic (taxonomy = the naming and description of species) methods, which means that I'm learning how to use morphology to help define species. I'm doing this because I have worked on spiders where genetics alone is simply not enough to differentiate species. You need morphology too.

As a side note: the idea that sequencing has become so cheap that it is easy to sequence whole genomes has knobs on it. It might be relatively easy if you work on something that is closely related to a model taxon (e.g. drosophilla, the tamar wallaby, the nematodes etc) BUT if you work on anything else (i.e. the vast majority of life) then whole genome sequencing means a lot of time, a lot of money and a lot of trouble.
posted by Alice Russel-Wallace at 2:04 AM on July 15, 2010 [1 favorite]

grouse: "You're kidding, right? "That crap" is one of the seminal works in bioinformatics, the direct predecessor of the electronic biomolecule databases we use today. Anyway, here's your protein."

That's insulin, and I need cytochrome c. Which I've found for all species involved. If you really want to double check my amateur Uniprot digging, here's the tuna sequence I'm using:

>sp|P00025|CYC_KATPE Cytochrome c OS=Katsuwonus pelamis GN=cyc PE=1 SV=2

Double checking the publications, this is too late to be available to the researchers, but is at least in the ballpark. And importantly, if I was going to go to the book vs PDB for accuracy's sake, I'd go for the 1969 version, which I assume raises the effort considerably.

But I'm not gonna proclaim dramatic expertise on the subject of bioinformatics. I took a single grad level course in the topic as a CS student. My goal is to apply the visualization to other stuff, but didn't come up with many good free implementations. I'm just applying my implementation to published examples as a neat historical anecdote and test case.
posted by pwnguin at 2:10 AM on July 15, 2010

A timely post; I have just submitted a paper for review describing software for clustering environmental barcode data. There are two main problems with the "platonic" approach to using DNA barcodes to assign specimens to species (which is the explicit aim of many projects). Firstly, how close does a sample barcode have to be to a reference sequence before we can confidently say, "that's what that is"? The best current answer involves comparing the rate of inter- and intra-specific variation for the gene of interest. Secondly, what does the system do with sample barcodes that don't match any reference sequence well enough?

The point about the "Dreadful DNA Data Deluge" is well made - largely because of next-gen sequencing technology. An interesting wrinkle with using next-gen sequencing for generating environmental barcode data is that the sequencing method generates errors (at a fairly high rate) that are hard to distinguish from true rare sequences - I have an MSc student working with me right now on exactly that problem.

Here's a paper showing how next-gen sequencing massively overestimates the diversity of a known sample (true number of unique sequences:90, estimated number of unique sequences:10,000)!

So the worry is that we might be inferring a hyperdiverse rare biosphere where non actually exists. This ties in interestingly to the "everything is everywhere" hypothesis. It's long been thought that at the small scale (<1mm), all species are essentially dispersed all over the world - i.e. there's no such thing as biogeography - we find the same set of species wherever we look. Large scale environmental barcoding offers us a way to test that idea, but the high error rate is making it difficult.
posted by primer_dimer at 3:48 AM on July 15, 2010 [1 favorite]

What was it you were going to tell me?
What did you tell me?
What have you told me?
posted by General Tonic at 7:49 AM on July 15, 2010 [1 favorite]

The current state of sequencing technology does have issues with assessing variation in DNA, but these will not last long. For example, with single molecule machines you can circularize the DNA with a known linker and then sequence the same bit of DNA 2-5 times, drastically reducing the error rate.

It will be interesting to see if barcoding remains more practical than whole-genome sequencing in the future; there will be a tradeoff between the cost of sequencing and the cost of analysis and it's unclear where it wil lie.
posted by Llama-Lime at 7:55 AM on July 15, 2010

Yeah, how about that Edward Norton, eh? Looks like he won't be playing Bruce Banner in the Avengers movie after all.
posted by Halloween Jack at 8:50 AM on July 15, 2010

If only there was somewhere on the Internet I could read a concise, informative post about DNA barcoding.
posted by ardgedee at 8:54 AM on July 15, 2010 [3 favorites]

pwnguin, the successor of the Atlas was the PIR. I only found a couple of tuna proteins from early PIR, and cytochrome c wasn't one of them. So yeah, looks like you would have to get the Atlas to get the actual sequence there.
posted by grouse at 10:18 AM on July 15, 2010

Honestly, we're already pretty close to being able to use the next-gen sequencing methods to get genomes (or genome-sized sets of DNA-related information) in a ridiculously short amount of time

That was my thought as well. Why go the down the barcode path when you can sequence the whole lot. I was at a seminar the other day where they were predicting whole human genomic tests would be performed for less than $100 a pop within 5 years (they currently cost about $2000 I think). It's seriously impressive how far the field has moved in the last 15 years. Some days I wake up and realise that I'm already living in the future.
posted by kisch mokusch at 12:44 PM on July 15, 2010

Srsly? A bibliography?
posted by mistersquid at 8:44 PM on July 15, 2010

« Older Singer/Songwriter Jewel, disguised as a mild-manne...  |  Santa Claus [1898] / Le Voy... Newer »

This thread has been archived and is closed to new comments