Vast sections of the human genome that were previously thought to have no useful function and were dismissed as "junk DNA" are in fact involved in key biochemical processes, an international team has found.
The five-year Encyclopedia of DNA Elements, or ENCODE, project has attempted to catalogue the bulk of genetic material that does not fall under the category of protein-coding genes, the building blocks necessary for life that comprise only two per cent of the human genome.
The results of their efforts, published Wednesday in a series of papers in the journals Nature, Genome Research and Genome Biology, are being widely seen as the most significant contribution to understanding the human genome since the last sequence was completed in 2003.
Genes make up only about two per cent of the human genome and have been well catalogued as part of the Human Genome Project begun in the 1990s, but little has been known about the purpose of the other 98 per cent.
"During the early debates about the Human Genome Project, researchers had predicted that only a few per cent of the human genome sequence encoded proteins, the workhorses of the cell, and that the rest was junk. We now know that this conclusion was wrong," said Eric D. Green, director of the National Human Genome Research Institute in Bethesda, Md., a part of the National Institutes of Health, which funded the ENCODE project to the tune of $288 million since 2003.
"ENCODE has revealed that most of the human genome is involved in the complex molecular choreography required for converting genetic information into living cells and organisms."
The effort to identify the functions of the genetic material that lies between the roughly 20,000 protein-coding genes involved hundreds of scientists at dozens of institutions in the U.S., U.K., Japan, Singapore and Spain.
The researchers found that about 80 per cent of the human genome has at least one biochemical activity associated with it. Much of that activity consists of telling protein-coding genes when and where in the body they should turn on and off.
About 18 per cent of DNA is involved in regulating protein-coding genes.
"Every cell in the body has the same genes, but different kinds of cells, such as liver or heart, switch on different combinations of genes," said John A. Stamatoyannopoulos, professor of genome sciences and medicine at the University of Washington who worked on the ENCODE project, in a press release.
"When cells become unhealthy, these combinations change. Understanding how genes turn on and off is therefore vital to deciphering their role in both normal health and disease.
"The instructions for how genes are controlled are contained in small DNA 'switches' that are scattered around the 98 per cent of the genome that does not contain genes. Mapping and decoding these instructions is a central mission of the ENCODE project."
The researchers located more than four million of these DNA 'switches,' information that should help scientists better understand how to prevent, diagnose and treat disease.
To help them understand the relationship between disease-associated genetic changes and the gene-controlling switches scattered around the genome, the researchers collected DNA maps from 349 tissue samples covering all major organ systems in adults and all stages of human development and compared them against genetic studies of more than 400 common diseases and clinical traits.
"They found that most disease-associated genetic changes occurred within gene-regulating switches, often located far away from the genes they control," a University of Washington press release said. "Most changes affected circuits active during early human development, when body tissues are most vulnerable."
The ENCODE scientists made their data publicly available in a vast database that has been posted on the ENCODE project portal as well as on the websites of the University of California, Santa Cruz Genome Browser, the National Center for Biotechnology Information, and the European Bioinformatics Institute.
The journal Nature diverged from its usual practice and presented the findings of the six core papers published by the ENCODE consortium as a series of thematic threads accessible through an interactive online "ENCODE Explorer."
Another 24 associated papers that came out of the research appeared in the journals Genome Research and Genome Biology.
"The ENCODE catalogue is like Google Maps for the human genome," said Elise Feingold, a program director at the National Human Genome Research Institute who helped start the ENCODE Project, in a news release.
"Simply by selecting the magnification in Google Maps, you can see countries, states, cities, streets, even individual intersections, and by selecting different features, you can get directions, see street names and photos, and get information about traffic and even weather.
"The ENCODE maps allow researchers to inspect the chromosomes, genes, functional elements and individual nucleotides in the human genome in much the same way."