Of the estimated 10 50 million plant and animal species on Earth, fewer than 2 million have been named and formally described over the last 250 years using a Linnaean classification system. The accelerated rate of species extinction due to human activities threatens to outpace the rate of species identification and discovery using this classification system, which relies on a small number of experts to define and catalog species groups according to common characteristics.
To accelerate species discovery and develop new tools to preserve and protect Earth’s biodiversity before it is lost, a global alliance of scientists is building a digital genetic registry of eukaryotic life using a DNA barcode system.
A familiar barcode system
The barcode system most familiar to us the one used by retail stores and supermarkets consists of two important components. The most obvious component is the UPC (Universal Product Code) barcode that is printed on the packaging of a product (or on a product label). The UPC barcode contains a unique combination of bars and spaces that distinguishes each product sold by a company. No two products share the same barcode.
It’s important to realize that in order for a barcode to be useful for a company and its consumers, it must be linked to specific information about a product, including its manufacturer and retail price. This information is stored within an electronic database that is maintained by a company and its employees. At some point, information for all of the products available for purchase was entered into the database and linked to digital representations of the barcodes printed on the company’s products. The digital representation of a barcode and the product information linked to it within the database together constitute a product’s reference barcode record.
When a printed barcode is scanned and digitized at a checkout counter, a search engine compares it to the product reference barcodes contained in the database. Once an exact match is found, the information linked to the barcode is retrieved. The information most relevant to the consumer is the retail price that the database links to the barcode printed on a product. For the company, the information linked to a UPC barcode helps its employees keep track of product inventory.
DNA barcodes are used to catalog and identify species
For animals, a DNA barcode is a short, 650 base pair sequence of the cytochrome c oxidase subunit I (COI) gene, a mitochondrial gene that encodes for a protein involved in cellular respiration. When the COI gene sequence is compared between members of the same species, few nucleotide differences are observed. That is to say, very low intraspecies variability exists in the COI gene sequence. In contrast, a larger number of differences in the COI gene sequence is observed between members of different species groups. Stated somewhat differently, higher interspecies variability exists in the COI gene sequence. Based on these differences, the nucleotide sequence of the COI gene segment (i.e. the DNA barcode) can be used to uniquely identify animal species. Other gene sequences are used to identify different species of plants and fungi.
In order to be useful as a species identification tool, a DNA barcode must be linked to a species name and other forms of information. This information is stored in the Barcode of Life Data Systems (BOLD), an electronic database and workbench that resides on the Internet. A global alliance of scientists working under the International Barcode of Life (iBOL) project is currently building the BOLD database by linking DNA barcodes generated from known specimens to a species name and other types of information related to the specimen, including where it was collected. Taken together, this information constitutes a reference DNA barcode record. A DNA barcode obtained from an unknown or unidentified specimen can be compared to reference DNA barcodes contained in the BOLD database by a search engine. When a match is found, a species name will be retrieved from the database and provided to the user.
The number of reference DNA barcode records in the BOLD database is growing at an impressive rate. Although this information is already being used to address important real-world problems (e.g. consumer fraud, the arrival of harmful agricultural pests, etc.), the database is largely incomplete. Indeed, reference barcode records for most animals, plants, and fungi are currently unavailable in BOLD, which limits its ability to be used as a species recognition tool for different practical applications that will ultimately benefit humanity and the world in which we live.
Join the largest biodiversity genomics initiative ever undertaken
The iBOL project was formally launched in October 2010 with the goal of creating 5-million reference barcode records by 2015. To meet this ambitious goal, scientists from over 150 nations are now working as a community to create reference barcodes from life forms with the highest practical importance to humanity, including vertebrates, land plants, fungi, pollinators, and human pathogens.
As a member of this global scientific community, Coastal Marine Biolabs and its collaborators are leading a student-centered campaign to generate reference barcodes for fish and invertebrate species that provide vital signs of marine ecosystem health. High school students and science teachers are cordially invited to join this scientific campaign as citizen scientists. The data that you generate through this campaign will someday help scientists to better understand how human activities and natural events impact California’s marine ecosystems and their inhabitants.