Encyclopedia created to detail the inner workings of the human and mouse genomes


DNA switches

Image representing DNA “switches” from human and mouse genomes that appear to regulate when and where genes are activated. Credit: Ernesto Del Aguila III, NHGRI.

The third phase of the ENCODE project offers new knowledge on the organization and regulation of our genes and genome.

The encyclopedia of DNA The Elements project (ENCODE) is a worldwide effort to understand how the human genome works. With the completion of its final phase, the ENCODE Project has added millions of candidate DNA “switches” from the human and mouse genomes that appear to regulate when and where genes are activated, and a new registry that allocates a portion of these switches from DNA to useful biological categories. The project also offers new visualization tools to aid in the use of large ENCODE data sets.

The latest results of the project were published in Nature, accompanied by 13 additional in-depth studies published in other major journals. ENCODE is funded by the National Human Genome Research Institute, part of the National Institutes of Health.

“One of ENCODE 3’s top priorities was to develop ways to share data from the thousands of ENCODE experiments with the research community at large to help broaden our understanding of genome function,” said NHGRI Director Eric Green, MD, Ph.D. “ENCODE 3 search and visualization tools make this data accessible, thus advancing efforts in open science.”

To assess the potential functions of different regions of DNA, ENCODE researchers studied biochemical processes that are generally associated with the switches that regulate genes. This biochemical approach is an efficient way to scan the entire genome quickly and completely. This method helps locate regions in DNA that are “candidate functional elements”: regions of DNA that are predicted to be functional elements based on these biochemical properties. Candidates can be tested in additional experiments to identify and characterize their functional roles in gene regulation.

“A key challenge at ENCODE is that different genes and functional regions are active in different cell types,” said Elise Feingold, Ph.D., scientific advisor for strategic implementation in the Division of Genome Sciences at NHGRI and leader in ENCODE for the institute. “This means that we need to analyze a large and diverse number of biological samples to work on a catalog of candidate functional elements in the genome.”

Significant progress has been made in the characterization of genes encoding proteins, which comprise less than 2% of the human genome. Researchers know much less about the remaining 98% of the genome, including how much and what parts of it perform other functions. ENCODE is helping to fill this gap in meaningful knowledge.

The human body is made up of billions of cells, with thousands of cell types. While all of these cells share a common set of DNA instructions, the various cell types (eg, Heart, Lung, and Brain) perform different functions by using the information encoded in the DNA differently. The regions of DNA that act as switches to turn genes on or off, or tune in to the exact levels of genetic activity, help drive the formation of different types of cells in the body and govern their functioning in health and disease.

During the recently completed third phase of ENCODE, the researchers conducted nearly 6,000 experiments – 4,834 in humans and 1,158 in mice – to illuminate the details of the genes and their potential regulators in their respective genomes.

The ENCODE 3 researchers studied mouse embryonic tissue development to understand the timeline of various genomic and biochemical changes that occur during mouse development. Mice, due to their genomic and biological similarity to humans, can help inform our understanding of human biology and disease.

These experiments in humans and mice were carried out in various biological contexts. The researchers looked at how chemical modifications of DNA, proteins that bind to DNA, and RNA (a sister molecule of DNA) interacts to regulate genes. The ENCODE 3 results also help explain how variations in DNA sequences outside protein coding regions can influence gene expression, even genes located far from a specific variant.

The data generated in ENCODE 3 dramatically increases our understanding of the human genome, ”said Brenton Graveley, Ph.D., professor and chair of UCONN Health’s Department of Genetics and Genome Sciences. “The project has added tremendous resolution and clarity for previous data types, such as DNA-binding proteins and chromatin marks, and new data types, such as long-range DNA interactions and protein-interactions. RNA “.

As a new feature, ENCODE 3 researchers created a resource detailing different types of DNA regions and their corresponding candidate functions. A web-based tool called SCREEN allows users to view the data that supports these interpretations.

The ENCODE project began in 2003 and is an extensive collaborative research effort involving groups from across the United States and internationally, comprising more than 500 scientists with diverse experience. It has benefited from and based on decades of gene regulation research by independent researchers around the world. ENCODE researchers have created a community resource, ensuring that project data is accessible to any researcher for their studies. These efforts in open science have resulted in more than 2,000 publications from non-ENCODE researchers using data generated by the ENCODE Project.

“This shows that the encyclopedia is widely used, which is what we have always been looking for,” said Dr. Feingold. “Many of these publications are related to human disease, demonstrating the value of the resource in relating basic biological knowledge to health research.”

Reference: DNA Elements Encyclopedia Project (ENCODE)