BENGALURU: Analysis of the largest repository of genome sequences globally shows that India has a higher percentage of SARS-CoV2 genomes considered “high quality” compared to the world average. India also has a higher percentage of sequences with patient data.
Data from GISAID, a global scientific initiative that provides open access to genomic data for influenza viruses and the novel coronavirus, shows that more than 2.9 lakh of SARS-CoV2 genomes have been sequenced worldwide, including 4,238 from India.
Of the total, 99.7% are of human origin – in India it is 100% – and 74% of humans are of ‘high quality’. Analysis of the Indian data shows that 80% are of high quality. Additionally, 32% of sequences from India have crucial patient data, compared to just 6% globally.
In fact, while only 1.5% of the total genomes in the global database are from India, almost 8% of the sequences with patient data are from India.
This specific data analysis was conducted by Professor SS Vasan, who leads the Covid-19 research at CSIRO, Australia. He told TOI: “It is commendable that India is loading a higher percentage of high-quality coronavirus genomes. This is a great starting point from where India can lead by example. ”
The data also shows that, globally, the virus is sequenced in one in about 270 cases, compared to about one in 2,400 cases in India. “It shouldn’t be a problem if India sequences fewer genomes if the quality is still high, there is enough annotation and there is no bias in deciding which samples to choose for sequencing,” Vasan said.
However, he said it would be prudent to sequence all imported cases. Dr. Giridhara Babu, a member of the ICMR research and surveillance task force, said that while big data is key to better understanding the virus, the findings would gain importance if the mutation actually changes the character of the virus.
“There are over 4,000 mutations around the world, but they become significant if the mutation changes the amino acid that causes the virus to behave differently than we have found in the UK and South Africa,” Babu said.
Dr V Ravi, a member of DBT’s expert committee on the Covid vaccine, while stating that the desire for genetic sequences is endless, said that India has been doing well given the limitations.
“Sequencing a genome can cost Rs 10,000-12,000 if done in bulk and Rs 25,000 individually. In addition, it is a job of great skill, I think we have done well, ”said Dr. Ravi.
Each genome sequence has 30,000 characters or letters and to consider any sequence as high quality, there are two general parameters. “One at least 29,000 of the 30,000 letters must be sequenced and two, less than 1% of the sequences are ambiguous,” Vasan said.
He further said that sufficient annotation (anonymized patient metadata, such as information on gender, age, comorbidities, etc.) and unbiased data are also key to drawing good conclusions using such data.
Indian footage from Andhra Pradesh, Assam, Bihar, Delhi, Gujarat, Haryana, Karnataka, Ladakh, Madhya Pradesh, Maharashtra, Odisha, Punjab, Rajasthan, Tamil Nadu, Telangana, UP, Uttarakhand, West Bengal, and J&K have been uploaded.
.