Thanks to artificial intelligence, the most complete and accurate database of prediction models of human proteome structures in history will be provided free and open to the scientific community. The initiative has been launched through collaboration between the company DeepMind and the European Molecular Biology Laboratory (EMBL), the leading European laboratory for biological sciences.
The publicly available catalog will include around 20,000 proteins expressed by the human genome. The database and the artificial intelligence system that has made it possible provide structural biologists with powerful new tools for examining the three-dimensional structure of proteins, and offer a trove of data that could pave the way for future breakthroughs and herald a breakthrough. new era for biology based on artificial intelligence.
In December 2020, the organizers of the Critical Assessment of Protein Structure Prediction (CASP) benchmarking recognized AlphaFold as a solution to the great challenge, in force for more than 50 years, which was to predict the structure of such proteins. The results achieved by AlphaFold were an astonishing achievement in the field.
The AlphaFold Protein Structure Database is based on this innovation and the discoveries of generations of scientists, from the pioneers of crystallography and protein structure analysis, to the thousands of prediction specialists and structural biologists who have spent years experimenting with proteins since then and who have openly shared their results. The database exploits and dramatically expands the accumulated knowledge about protein structures, more than doubling the number of human protein structures with highly accurate predictions available to researchers. Advancing the understanding of these basic components of life, which underpin biological processes in all living things, will allow researchers in a wide variety of fields to accelerate their work.
Protein structures representing data obtained via AlphaFold. (Image: AlphaFold / Karen Arnott / EMBL-EBI)
The methodology of the latest and innovative version of AlphaFold, the sophisticated artificial intelligence system announced last December that powers these structure predictions, and its open source code were recently published in the academic journal Nature. The new announcement coincides with the publication in Nature of a second study that provides the most complete picture of the proteins that make up the human proteome, and the publication of the proteins of 20 additional organisms that are important for biological research.
“Our goal at DeepMind has always been to build artificial intelligence and use it as a tool to help accelerate the pace of scientific progress, thereby enhancing our understanding of the world around us,” says DeepMind Founder and CEO Demis Hassabis. “We have used AlphaFold to generate the most complete and accurate picture of the human proteome. We believe this is the most significant contribution artificial intelligence has made to the advancement of scientific knowledge to date, and is a great example of the kinds of benefits that artificial intelligence can contribute to society. “
AlphaFold is already helping scientists accelerate their discoveries. The ability to computationally predict the shape of a protein from its amino acid sequence, rather than having to determine it experimentally with painstaking, laborious, and often expensive techniques, is already helping scientists achieve in months what used to take years of work.
“The AlphaFold database is a perfect example of the virtuous circle of open science,” emphasizes EMBL CEO Edith Heard. “AlphaFold has been trained using public resource data created by the scientific community, so it makes sense for its predictions to be public. Sharing AlphaFold’s predictions openly and for free will allow researchers around the world to gain new insights and drive forward new discoveries. I believe that AlphaFold is a true revolution for the life sciences, as genomics was several decades ago and I am very proud that the EMBL has been able to help DeepMind to enable open access to this extraordinary resource. “
AlphaFold is already being used by partners like the Neglected Diseases Drugs Initiative (DNDi), which has advanced its research on life-saving cures for diseases that disproportionately affect the world’s poorest areas. , or the Center for Enzyme Innovation (IEC) that uses AlphaFold to help design faster enzymes to recycle some of the most polluting single-use plastics. AlphaFold has helped accelerate the research of those scientists working on the experimental determination of the structure of proteins. For example, a team from the University of Colorado at Boulder (United States) uses AlphaFold’s predictions to study antibiotic resistance, while a group from the University of California, San Francisco (United States) has used them to study the biology of SARS-CoV-2.
The AlphaFold protein structure database is based on many contributions from the international scientific community, as well as the refined algorithmic innovations of AlphaFold and the decades of experience of the EMBL European Bioinformatics Institute (EMBL-EBI) sharing biological data. worldwide. DeepMind and the EMBL-EBI are giving free access to AlphaFold’s predictions for anyone to use the system to enable and accelerate research and explore new avenues of scientific knowledge.
“This will be one of the most important data sets since the Human Genome map,” said EMBL Deputy Director General and EMBL-EBI Director Ewan Birney. “Making AlphaFold’s predictions accessible to the international scientific community opens up many new avenues of research, from neglected diseases to new enzymes for biotechnology and much more. This is a great new scientific tool, complementing existing technologies and helping us It will allow us to expand the limits of our understanding of the world. “
Among the first more than 350,000 structures published in the database, in addition to the human proteome, are the proteins of 20 biologically significant organisms such as the bacterium E. coli, the fruit fly, the mouse, the zebrafish, the parasite of malaria and tuberculosis bacteria. Much important research has been done on these organisms, and having these structures available will allow many researchers from very different fields, from neuroscience to medicine, to accelerate their work.
The AlphaFold database and system will be updated periodically as investment continues in future AlphaFold enhancements, and in the coming months it is planned to greatly expand coverage to nearly all sequenced proteins known to science – more than 100 million. of structures that include most of UniProt, the reference database. (Source: EMBL)