Researchers update the automated calculation tool for the analysis of the SARS-CoV-2 genome



[ad_1]

The researchers updated an earlier version of an automated tool to include analysis of the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) genome. Genome analysis using freely available software could help track the evolution of the virus and quickly identify variants that increase viral transmission or virulence.

Study: Evolution of the knowledge of the SARS-CoV-2 genome of 200,000 patients with COVID-19.  Image Credit: Limbitech / Shutterstock

The SARS-CoV-2 pathogen has been mutating since it was first discovered in late 2019. Some of these mutations have increased the fitness of the virus, which can affect COVID-19 disease outcomes, transmissibility, and, later, it could affect the effectiveness of current medications. vaccinations Therefore, the complete and continuous sequencing of as many genomes as possible around the world will be crucial to stay abreast of the pandemic.

There are some national initiatives for active surveillance of the SARS-CoV-2 genome, such as the COVID-19 Genomics UK Consortium in the UK and the Indian SARS-CoV-2 Genomics Consortium, which are tasked with identifying new variants. Any increase in cases due to a new variant will require immediate action to contain the spread. This will also require automated methods to analyze and identify new strains.

In an article published in bioRxiv * Preprint Server, the researchers report on the second generation of a calculation tool, Infectious Pathogen Detector (IPD), to determine the abundance and mutation of SARS-CoV-2, which has an expanded database of variants and an evaluation of revised clade.

Finding the frequency of mutations

The authors analyzed 200,865 SARS-CoV-2 genome sequences from 155 countries, which had 2.58 million mutations as of December 28, 2020 compared to the Wuhan reference strain. They found about 39% synonymous mutations, mutations that are usually minor and do not change amino acids. About 51% were non-synonymous mutations, which are mutations that change amino acids. Approximately 9% of the mutations were in the intergenic region with the 5 ‘and 3’ coding UTRs. Among the non-synonymous mutations, about half were nonsense mutations or single nucleotide mutations.

The researchers looked at 13 hot spot residues that occurred in more than 40,000 samples. The most frequent synonymous mutation occurred 186,189 times in the NSP3 gene followed by a mutation in the RNA-dependent RNA polymerase gene 185,945 times. Non-synonymous mutations D614G and A222V occur 176,436 and 47,971 times, respectively, in the spike protein S gene. The next common mutation is a R / G203K / R 2 amino acid change. The A220V mutation in the N gene occurs 48,426 times, the third most common mutation.

The D614G mutation causes a higher viral load in the respiratory tract, but does not alter the severity of the disease. The team found no significant frequency for the other peak protein mutations N439K, S477Y, E484K, and N501Y. The 13 most common mutations comprise five synonymous mutations that likely affect mRNA splicing or selection in codon usage bias, stability, and translational folding or cotranslational protein folding.

Upon further analysis of the data, the team discovered that the S, N, M, ORF7a, and ORF10 genes, about 21% of the genome, account for 54.36% of all non-synonymous mutations in SARS-CoV-2. The S and M genes have the smallest proportion of total variable bases in the virus genome, suggesting a strong positive selection for non-synonymous mutations in these genes.

Among the other new variants of the SARS-CoV-2 virus, the UK B1.1.7 mutant had 32 mutations, the South African B.1.351 mutation had 25 mutations, and the Brazilian P.1 variant had 25 mutations.

Genome surveillance tool

By comparing the predominant variants in the three new strains, along with those from India, the authors found four common hot spot mutations that included D514G. N501Y was the base mutation in all three variants, and the South African and Brazilian strains showed an additional E484K mutation in the peak protein.

Neither of these two mutations was seen in the Indian samples, and only two of the 3,361 Indian samples showed the S477N mutation. It is unknown whether the absence of these mutations, which have increased the binding affinity to the human angiotensin converting enzyme 2 (ACE2) receptor, could explain the lower transmission in India compared to the United Kingdom, Brazil, and South Africa.

Clade analysis revealed that 20E, 20B and 20A were the most dominant. All the resulting analysis of variant and clade information was included in the second generation DPI database. The team found that IPD 2.0 mapped clades with high precision when tested using a simulated sequence data set generated from the genomes of different clades.

The updated variant database and clade evaluation module allows the quantification and phylogenetic evaluation of the SARS-CoV-2 genome. The authors write: “This makes IPD 2.0 a relevant tool for the analysis of diverse SARS-CoV-2 sequence data sets and facilitates genomic surveillance to identify variants involved in breakthrough infections.”

*Important news

bioRxiv publishes preliminary scientific reports that are not peer-reviewed and, therefore, should not be considered conclusive, guide clinical practice / health-related behavior, or be treated as established information.

[ad_2]