Scientists achieve the first complete assembly of the human X chromosome


genome

Credit: CC0 Public Domain

Although the current reference human genome is the most accurate and complete vertebrate genome ever produced, there are still gaps in the DNA sequence, even after two decades of improvements. Now, for the first time, scientists have determined the complete sequence of a human chromosome from one end to the other (‘telomere to telomere’) with no gaps and an unprecedented level of precision.


The publication of the telomere-to-telomere assembly of a complete human X chromosome on July 14 in Nature It is a historic achievement for genomics researchers. Lead author Karen Miga, a research scientist at the University of California Santa Cruz Institute of Genomics, said the project was made possible by new sequencing technologies that allow “ultra-long readings,” such as the nanopore sequencing technology started in the University of California at Santa Cruz.

Repetitive DNA sequences are common across the genome and have always posed a challenge for sequencing because most technologies produce relatively short “readings” of the sequence, which must then be assembled like a puzzle to assemble the genome. The repetitive sequences produce many short readings that look almost identical, like a large expanse of blue sky in a puzzle, with no clues as to how the pieces fit together or how many repetitions there are.

“These repeat-rich sequences were once considered intractable, but we have now made great strides in sequencing technology,” said Miga. “With nanopore sequencing, we get ultra-long readings of hundreds of thousands of base pairs that can span an entire repeating region, so it avoids some of the challenges.”

Filling the remaining gaps in the human genome sequence opens up new regions of the genome where researchers can search for associations between sequence variations and disease and other clues to important questions about human biology and evolution.

“We are beginning to discover that some of these regions where there were gaps in the reference sequence are actually among the richest in variation in human populations, so we are missing a lot of information that could be important in understanding human disease biology.” Miga said.

Miga and Adam Phillippy of the National Human Genome Research Institute (NHGRI), both corresponding authors of the new article, co-founded the Telomere-to-Telomere (T2T) consortium to search for a complete genome assembly after working together on an article by 2018 that demonstrated the potential of nanopore technology to produce a complete sequence of the human genome. That effort used Oxford Nanopore Technologies’ MinION sequencer, which sequences DNA by detecting the change in current flow as individual DNA molecules pass through a small hole (a “nanopore”) in a membrane.

The new project built on that effort, combining nanopore sequencing with other PacBio and Illumina sequencing technologies, and optical maps from BioNano Genomics. Using these technologies, the team produced a complete genome set that exceeds all previous human genome sets in terms of continuity, integrity, and precision, even outperforming the current human reference genome by some metrics.

However, there were still multiple pauses in the sequence, Miga said. To finish the X chromosome, the team had to manually resolve several gaps in the sequence. Two segmental duplications were resolved with ultra-long nanopore readings that completely spanned the repeats and were uniquely anchored on each side. The remaining break was in the centromere, a notoriously difficult region of repetitive DNA found on each chromosome.

On the X chromosome, the centromere spans a highly repetitive region of DNA spanning 3.1 million base pairs (bases A, C, T, and G form pairs on the DNA double helix and encode genetic information in their sequence ). The team were able to identify variants within the repeat sequence to serve as markers, which they used to align long readings and connect them to span the entire centromere.

“For me, the idea that we can put together a 3 megabase size tandem repeat is just mind boggling. Now we can get to these repeat regions that cover millions of bases that were previously considered intractable,” Miga said.

The next step was a polishing strategy using data from multiple sequencing technologies to ensure the precision of each base in the sequence.

“We use an iterative process on three different sequencing platforms to polish the sequence and achieve a high level of precision,” Miga explained. “Unique markers provide an anchor system for ultra-long readings, and once you anchor the readings, you can use multiple data sets to call each base.”

Nanorepore sequencing, in addition to providing ultra-long readings, can also detect bases that have been modified by methylation, an “epigenetic” change that does not alter the sequence but has important effects on DNA structure and gene expression. By mapping methylation patterns on the X chromosome, the team was able to confirm previous observations and reveal some interesting trends in methylation patterns within the centromere.

The new human genome sequence, derived from a human cell line called CHM13, closes many gaps in the current reference genome, known as the Genome Reference Consortium build 38 (GRCh38).

The T2T consortium continues to work to complete all of the CHM13 chromosomes. “It is an open consortium, so in many ways this is a community-driven project, with a lot of people dedicating time and resources to it,” Miga said.


Scientists get closer to mapping the entire human genome


More information:
Karen H. Miga et al, Telomere-to-telomere assembly of a complete human X chromosome, Nature (2020). DOI: 10.1038 / s41586-020-2547-7

Provided by the University of California – Santa Cruz

Citation: Scientists achieve the first complete assembly of the human X chromosome (2020, July 14) retrieved on July 14, 2020 from https://phys.org/news/2020-07-scientists-human-chromosome.html

This document is subject to copyright. Other than fair dealing for private study or research purposes, no part may be reproduced without written permission. The content is provided for informational purposes only.