Concepedia

Publication | Open Access

Evolutionary-scale prediction of atomic-level protein structure with a language model

4.2K

Citations

59

References

2023

Year

TLDR

Recent advances in machine learning have leveraged evolutionary information in multiple sequence alignments to predict protein structure. The study demonstrates direct inference of full atomic‑level protein structure from primary sequence using a large language model. The authors train a 15‑billion‑parameter protein language model that learns atomic‑resolution structure directly from sequence, enabling rapid prediction of millions of metagenomic proteins. The approach achieves an order‑of‑magnitude speedup in high‑resolution structure prediction and produces the ESM Metagenomic Atlas, comprising over 617 million predicted structures, of which more than 225 million are high‑confidence, revealing the breadth of natural protein diversity.

Abstract

Recent advances in machine learning have leveraged evolutionary information in multiple sequence alignments to predict protein structure. We demonstrate direct inference of full atomic-level protein structure from primary sequence using a large language model. As language models of protein sequences are scaled up to 15 billion parameters, an atomic-resolution picture of protein structure emerges in the learned representations. This results in an order-of-magnitude acceleration of high-resolution structure prediction, which enables large-scale structural characterization of metagenomic proteins. We apply this capability to construct the ESM Metagenomic Atlas by predicting structures for >617 million metagenomic protein sequences, including >225 million that are predicted with high confidence, which gives a view into the vast breadth and diversity of natural proteins.

References

YearCitations

Page 1