Deepmind today announced its collaboration with the European Molecular Biology Laboratory (EMBL), Europe’s leading life sciences laboratory, to freely and openly provide the scientific community with the database of the most complete and accurate prediction models of the structures of the human proteome (the complete set of proteins encoded by the human genome) to date.
This will include around 20,000 proteins expressed by the human genome. The database and the artificial intelligence they provide structural biologists with powerful new tools for examining the three-dimensional structure of proteins, and offer a trove of data that could pave the way for future breakthroughs and herald a new era for AI-based biology.
Today’s announcement provides the most complete picture of the proteins that make up the human proteome, and the publication of the proteins of 20 additional organisms that are important for biological research.
In December 2020, the organizers of the Critical Assessment of Protein Structure Prediction (CASP) benchmarking recognized AlphaFold as a solution to the great challenge of more than 50 years of predicting protein structure, a staggering achievement in the field.
The protein structure database AlphaFold (AlphaFold Protein Structure Database) is based on this innovation and the discoveries of generations of scientists, from the pioneers of crystallography and protein structure analysis, to the thousands of prediction specialists and biologists and structural biologists who have spent years experimenting with proteins since then and who have openly shared their results.
The database dramatically expands the accumulated knowledge about protein structures, more than doubling the number of human protein structures with highly accurate predictions available to researchers. Advancing the understanding of these basic components of life, which underpin biological processes in all living things, will allow researchers in a wide variety of fields to accelerate their work.
Last week the methodology of the latest and innovative version of AlphaFold, the sophisticated artificial intelligence system announced last December that powers these structure predictions, and its open source code were published in the journal Nature. Today’s announcement coincides with a second Nature article that provides the more complete picture of proteins that make up the human proteome, and the publication of proteins from 20 additional organisms that are important for biological research.
“Our goal at DeepMind has always been to build artificial intelligence and use it as a tool to help accelerate the pace of scientific discovery, and thus improve our understanding of the world around us ”, explains the founder and CEO of DeepMind, Demis Hassabis.
“We have used AlphaFold to generate the most complete and accurate image of the human proteome. We believe this is the most significant contribution artificial intelligence has made to the advancement of scientific knowledge to date, and it is a great example of the kinds of benefits that artificial intelligence can bring to society, ”he continues.
Helping scientists to accelerate their discoveries
The ability to computationally predict the shape of a protein from its amino acid sequenceInstead of having to determine it experimentally with painstaking, laborious, and often expensive techniques, you are already helping scientists achieve in months what previously required years of work.
AlphaFold has helped accelerate the research of those scientists working on the experimental determination of the structure of proteins
“The AlphaFold database is a perfect example of the virtuous circle of open science”, explains the CEO of EMBL, Edith heard. “AlphaFold has been trained using public resource data created by the scientific community, so it makes sense for its predictions to be public. Sharing AlphaFold predictions openly and for free will enable researchers around the world to gain new insights and drive new discoveries. I believe that AlphaFold is a true revolution for the life sciences, just as genomics was several decades ago and I am very proud that the EMBL has been able to help DeepMind to enable open access to this extraordinary resource, ”she adds.
AlphaFold is already being used by partners like the Neglected Diseases Drugs Initiative (DNDi), which has advanced its research on life-saving cures for diseases that disproportionately affect the world’s poorest areas. , or the Center for Enzyme Innovation (IEC) that AlphaFold uses to help design enzymes faster to recycle some of the plastics more single-use pollutants.
AlphaFold has helped accelerate the research of those scientists working on the experimental determination of the structure of proteins. For example, a team from the University of Colorado at Boulder uses AlphaFold’s predictions to study antibiotic resistance, while a group from the University of California, San Francisco has used them to study the biology of SARS-CoV-2. .
The AlphaFold Protein Structure Database
The AlphaFold protein structure database is based on many contributions from the international scientific community, as well as refined ones. algorithmic innovations from AlphaFold and on the decades of experience of EMBL’s European Bioinformatics Institute (EMBL-EBI) sharing global biological data. DeepMind and the EMBL-EBI are giving free access to AlphaFold’s predictions for anyone to use the system to enable and accelerate research and explore new avenues of scientific knowledge.
Making AlphaFold’s predictions accessible to the international scientific community opens up many new avenues of research, from neglected diseases to new enzymes for biotechnology and much more.
Ewan Birney, Deputy Director General of EMBL
“This will be one of the most important data sets from the Human Genome map,” emphasizes the Deputy Director General of EMBL and the director of EMBL-EBI, Ewan birney. “Making AlphaFold’s predictions accessible to the international scientific community opens up many new avenues of research, from neglected diseases to new enzymes for biotechnology and much more. This is a great new scientific tool, complementing existing technologies and allowing us to push the limits of our understanding of the world. ”
Among the first more than 350,000 structures published in the database, in addition to the human proteome, are the proteins of 20 biologically significant organisms such as E. coli, the fruit fly, the mouse, the zebra fish, the malaria parasite and the tuberculosis bacteria. Much important research has been done on these organisms, and having these structures available will allow many researchers from very different fields, from neuroscience to medicine, to accelerate their work.
The database and system will be updated periodically as investment continues in future AlphaFold enhancements, and in the coming months it is planned to greatly expand coverage to almost all sequenced proteins known to science – more than 100 million structures. which include most of UniProt, the reference database.
Kathryn Tunyasuvunakool et al. “Highly accurate protein structure prediction for the human proteome” Nature
Rights: Creative Commons.