Computer model predicted dominant SARS-CoV-2 variants

, ,

On May 24, 2022, scientists at the Broad Institute of MIT and Harvard and the University of Massachusetts Medical School announced they had developed a machine-learning model that can analyze millions of SARS-CoV-2 genomes and predict which viral variants will likely dominate and cause surges in COVID-19 cases. The model, called PyR0 (pronounced “pie-are-nought”), could help researchers identify which parts of the viral genome will be less likely to mutate and hence be good targets for vaccines that will work against future variants. The findings appear today in Science.

The researchers trained the machine-learning model using 6 million SARS-CoV-2 genomes that were in the GISAID database in January 2022. They showed how their tool can also estimate the effect of genetic mutations on the virus’s fitness — its ability to multiply and spread through a population. When the team tested their model on viral genomic data from January 2022, it predicted the rise of the BA.2 variant, which became dominant in many countries in March 2022. PyR0 would have also identified the alpha variant (B.1.1.7) by late November 2020, a month before the World Health Organization listed it as a variant of concern.

PyR0 is based on a machine-learning framework called Pyro, which was originally developed by a team at Uber AI Labs. In 2020, three members of that team including Obermeyer and Martin Jankowiak, the study’s second author, joined the Broad Institute and began applying the framework to biology.

Researchers around the world have been working to predict the fitness of different SARS-CoV-2 viral variants since early in the pandemic. But previous models could not compare all variants simultaneously, or took days to process only a few thousand genomes. 

By contrast, PyR0 can analyze millions of genomes — all of the publicly available SARS-CoV-2 data — in about an hour. It does this by grouping similar sequences together, and then defining “clusters” of genomes by the constellation of mutations they share. By focusing on mutations, which can appear in multiple variants, PyR0 has more statistical power than models that focus on viral variants. 

Next, the model determines which mutations are becoming more common and estimates how quickly each mutation can cause the virus to spread. It also estimates how rapidly the number of cases of different variants will increase based on their genetic makeup. 

By identifying which mutations are important for the fitness of particular variants, the model also offers biological insight into how COVID-19 spreads and develops. For example, knowing the critical mutations can help scientists predict whether new variants will be more contagious or evade neutralizing antibodies, and can also help them decide which mutations to study in greater detail.

Tags:


Source: Broad Institute
Credit: