First end-to-end DNA sequence of a human chromosome: “New era in genomic research”


First complete sequence of the human X chromosome

Image representing the pieces of the DNA sequence puzzle that come together. Credit: Ernesto del Aguila III (NHGRI)

Researchers have generated the first complete sequence of X chromosomes

Researchers at the National Human Genome Research Institute (NHGRI), part of the National Institutes of Health (NIH), have produced the first study from start to finish. DNA sequence of a human chromosome. The results, published today (July 14, 2020) in the journal Nature, show that it is now possible to generate a precise sequence, base by base, of a human chromosome, and will allow researchers to produce a complete sequence of the human genome.

“This achievement begins a new era in genomic research,” said Eric Green, MD, Ph.D., director of NHGRI. “The ability to generate truly complete chromosome and genome sequences is a technical feat that will help us gain a comprehensive understanding of genome function and inform the use of genomic information in healthcare.”

After nearly two decades of improvements, the human genome reference sequence is the most accurate and complete vertebrate genome sequence ever produced. However, there are hundreds of missing DNA sequences or gaps that are unknown.

These gaps often contain repetitive DNA segments that are exceptionally difficult to sequence. However, these repetitive segments include genes and other functional elements that may be relevant to human health and disease.

Because a human genome is incredibly long and consists of approximately 6 billion bases, DNA sequencing machines cannot read all bases at once. Instead, the researchers cut the genome into smaller pieces, then analyze each piece to produce sequences of a few hundred bases at a time. Those shorter DNA sequences must be put back together.

Lead author Adam Phillippy, Ph.D., of the National Human Genome Research Institute (NHGRI) compared this problem to solving a puzzle.

Imagine having to piece together a puzzle. If you are working with smaller pieces, each one contains less context to find out where it came from, especially in parts of the puzzle without any unique clues, like a blue sky, ”he said. “The same is true for sequencing the human genome. Until now, the pieces were too small and there was no way to put the most difficult parts of the genome puzzle together. “

Of the 24 human chromosomes (including X and Y), the study authors Phillippy and Karen Miga, Ph.D., at the University of California, Santa Cruz, chose to complete the X chromosome sequence first, due to their link to a myriad of diseases, including hemophilia, chronic granulomatous disease, and Duchenne muscular dystrophy.

Humans have two sets of chromosomes, one from each parent. For example, biologically female humans inherit two X chromosomes, one from their mother and one from their father. However, those two X chromosomes are not identical and will contain many differences in their DNA sequences.

In this study, the researchers did not sequence the X chromosome of a normal human cell. Instead, they used a special type of cell, one that has two identical X chromosomes. This cell provides more DNA for sequencing than a male cell, which has a single copy of an X chromosome. It also avoids the sequence differences found when analyzing two X chromosomes from a typical female cell.

The authors and their colleagues capitalized on new technologies that can sequence long segments of DNA. Instead of preparing and analyzing small pieces of DNA, they used a method that leaves the DNA molecules largely intact. These large DNA molecules were analyzed by two different instruments. Each of them generates very long DNA sequences, something that previous instruments could not achieve.

After analyzing the human X chromosome in this way, Phillippy and his team used their newly developed computer program to assemble the many segments of generated sequences. Miga’s group led the effort to close the largest remaining sequence gap on the X chromosome, the roughly 3 million bases of repetitive DNA found in the middle part of the chromosome, called the centromere.

There is no “gold standard” for researchers to critically evaluate accuracy to assemble such highly repetitive DNA sequences. To help confirm the validity of the generated sequence, Miga and her collaborators performed several validation steps.

“We have never seen these sequences in our genome before, and we don’t have many tools to test whether the predictions we are making are correct. That is why it is important to have specialists in the genomics community who evaluate and guarantee that the final product is of high quality, “said Miga.

The effort is part of a broader initiative by the Telomere-to-Telomere (T2T) consortium, partially funded by NHGRI. The consortium aims to generate a complete reference sequence of the human genome.

The T2T consortium continues its efforts with the remaining human chromosomes, with the goal of generating a complete sequence of the human genome by 2020.

“We still don’t know what we’ll find in the newly discovered sequences. He is the exciting unknown of the discovery. This is the era of complete genome sequences, and we are embracing it wholeheartedly, “said Phillippy.

Potential challenges remain. Chromosomes 1 and 9, for example, have repetitive DNA segments that are much larger than those found on the X chromosome.

“We know that these previously unknown sites in our genome are very different between individuals, but it is important to begin to discover how these differences contribute to biology and human disease,” said Miga. Both Phillippy and Miga agree that improving sequencing methods will continue to create new opportunities in human genetics and genomics.

Reference: July 14, 2020, Nature.
DOI: 10.1038 / s41586-020-2547-7