‘The game has changed.’ AI’s victory in resolving protein structures | Science


Artificial intelligence (blue) and experimentally determined (green) matching protein compositions almost perfectly.

Deepmind

By Robert F. Service

Artificial intelligence (AI) has solved a major biological challenge: predicting how proteins curl in a 3D shape from a linear chain of amino acids that allow them to perform life functions. Today, leading structural biologists and organizers of a biennial protein-folding competition announced the achievement by researchers at UK-based AI company Deepmind. The Deepmind method, they say, will have a far-reaching effect, dramatically accelerating the creation of new drugs.

Janet Thornton, director of the European Institute of Bioinformatics, says, “What the Deepmind team has achieved is amazing and will transform the future of structural biology and protein research. “This is a 500-year-old problem,” adds John Moult, a Critical Assessment of Protein Structure Prediction (CASP), a competition between Shady Grove, a structural biologist at the University of Maryland, and the program Protein Structure Prediction (CASP). “I never thought I’d see this in my lifetime.”

The human body uses thousands of different proteins, each dozens to hundreds of amino acids. The sequence of those amino acids determines how numerous pushes and pulls between them give rise to complex 3D shapes of proteins, which, in turn, determine how they function. Knowing those shapes helps researchers develop drugs that can fit into protein pockets and incisions. And being able to synthesize proteins with the desired structure can accelerate the development of enzymes that make bifuel and degrade waste plastics.

For decades, researchers have concealed the 3D structure of proteins using experimental techniques such as X-ray crystallization or cryo-electron microscopy (cryo-EM). But such methods take months or years and do not always work. Structures have been resolved for only 170,000 of the more than 200 million proteins found in life forms.

In the 1960s, researchers realized that if they could perform all the individual interactions in the order of a protein, they could predict its 3D shape. With hundreds of amino acids per protein and numerous ways that each pair of amino acids could interact, however, the number of possible structures per sequence was astronomical. Calculus scientists jumped on the bandwagon, but progress was slow.

In 1994, Moult and colleagues launched CASP, which takes place every 2 years. Entrants receive amino acid sequences for about 100 proteins, the compositions of which are not known. Some groups calculate the composition for each sequence, while other groups determine it on an experimental basis. Organizers then compare laboratory predictions with laboratory results and give the predictions a Global Distance Test (GDT) score. Scores above 90 on a scale of zero to 100 are considered with experimental methods, Moult says.

Even in 1994, predictive formulations for small, simple proteins matched experimental results. But for a large, challenging protein, the calculated GDT scores were about 20, “a complete disaster,” says a CASP judge and CASP judge of developmental biology and evolutionary biologist. By 2016, rival groups had reached the number of about 40 for solid proteins, largely by understanding known known structures of proteins closely related to CASP targets.

Fold

In the biennial Critical Assessment Competition for Protein Structure Prediction (CASP), groups predict the 3D structure of proteins. This year, Alphafold outperformed all other groups and matched experimental findings on accuracy criteria.



20

20

20

20 100806040200

2001

2001

2001

2001

2001

2001

2001

2001 Difficulty predicting protein formation Global Distance Test% 100806040200 CASP 1 (1994) CASP 5 (2002) CASP 12 (2016) CASP 13 (2018) Easy Difficult CASP 14 (2020)Other competitors Alphafold (2020)

C. Bickle /Science

When DeepMind first participated in 2018, its algorithm, called Alphafold, relied on this comparative strategy. But Alphafold also incorporates a computational approach called Deep Learning, in which software is trained on a huge data trove – in this case, sequences, structures and known proteins – and spot patterns are learned. Deepmind beat the competition by beating the average by 15% on each structure, and won about 60 GDT scores for the toughest targets.

But the predictions were still rough enough to be useful, says John Jumper, head of development at Alphafold at Deepmind. “We knew how far we were from biological compatibility.” To make matters better, Jumper and his colleagues combined deep learning with a “tension algorithm” that simulates a person’s way of assembling a jigsaw puzzle: first connecting the pieces into small clusters of amino acids in this case clusters – and then looking for ways to do it. Is. Join a large full lump. Working on a modest, 128-processor computer network, they trained the algorithm on all 170,000 or well-known protein structures.

And it worked. Ahead of the target protein in CASP this year, Alphafold achieved an average GDT score of 92.4. For a very challenging protein, Alphafold averaged 87, 25 points, predicting the next best. It was also excellent at resolving the formation of proteins that reside in the cell membrane, which is the center of many human diseases, but is notoriously difficult to solve by X-ray crystallography. Venki Ramakrishnan, a structural biologist at the Medical Research Council Laboratory of Molecular Biology, called the result a “wonderful advance on the problem of protein folding.”

All groups in this year’s competition improved, Moult says. But with Alphafold, Lupas says, “the game has changed.” Organizers also worried that Deepmind might have cheated in some way. So Lupus posed a special challenge: the ancient group of ancient bacteria, the membrane protein of the ancient race. For 10 years, his research team tried every trick in the book to obtain an X-ray crystal structure of the protein. “We couldn’t solve it.”

But Alphafold had no trouble. He returned a detailed image of a three-part protein with two long ancillary arms in the middle. The model enabled Lupas and his colleagues to understand their X-ray data; Within half an hour, they fitted their experimental results into Alphafold’s predictive structure. “It’s almost perfect,” Lupas says. “It simply came to our notice then. I don’t know how they do it. “

As a condition of entry into the CASP, DeepMind, like all groups, agreed to disclose sufficient details about the method of recreating it for other groups. It will be a boon for experimenters who will be able to use accurate structural predictions to understand opaque X-rays and cryo-EM data. Drug designers say the structure of each of the proteins in new and dangerous pathogens, such as SARS-Cavi-2, can act quickly, says Moult.

Still, Alphafold doesn’t do everything well yet. In the contest, it broke down significantly on a single protein, a combination of 52 small repeating segments that distort the position of each other as they accumulate. Jumper says the team now wants to inspect alphafolds to solve such structures, as well as cells of protein complexes that work together to perform key functions in the cell.

Although one major challenge has fallen, others will no doubt emerge. “This is not the end of anything,” Thornton says. “It’s the beginning of many new things.”