(ORDO NEWS) — Google has collaborated with the European Bioinformatics Institute to develop a deep learning neural network that can predict the structure, function, and properties of proteins from their primary structure with high accuracy.
The new tool will greatly facilitate and accelerate the development of new drugs and the design of new enzymes for the industrial production of food, biofuels and chemicals.
Understanding the relationship between the amino acid sequence (primary structure) and the function of a protein, and hence the mechanism of its work, is a long-standing problem in molecular biology and an equally long-standing dream of specialists in various industries, from pharmaceutical to food and chemical.
For the production of various chemicals, whether it be a drug, a food additive or a chemical, it is natural to use the best catalysts available – enzymes, that is, proteins.
Moreover, each application needs its own protein with a specific function (transfer of an electron or individual chemical groups, formation or breaking of chemical bonds, and so on). Man has not yet learned how to create such enzymes from scratch, so he looks for possible solutions from nature and often finds them in microorganisms.
Despite six decades of progress, modern methods and algorithms cannot determine the functions of a third of the sequences of already known microbial proteins, which limits the possibility of their application in the interests of mankind. At the same time, more than one hundred thousand new protein sequences are added to global databases every day.
However, for practical applications, these data are of little use if they are not accompanied by functional annotations (that is, a description of the functions of the protein and its biological role in the cell).
An increase in the size of the Trembl database (one of the protein sequence databases) over time and a corresponding decrease in the proportion of proteins whose function is accurately determined (manually) / © Google Research/ProteInfer
The function of a protein can be determined experimentally using a number of modern methods – microarray analysis , RNA interference , two-hybrid analysis , and others. But the pace of experimental proof of the functions of discovered proteins lags far behind the pace of discovery of new sequences, and is unlikely to ever catch up with them.
Therefore, the annotation of new protein sequences will proceed mainly by prediction based on computational methods comparing them with the amino acid sequences of proteins with already known functions determined experimentally.
At the same time, when the need arises to create a new production technology (for example, the same drug), scientists will have neural network predictions in their hands, according to which they will have to manually determine the most suitable candidate proteins and check their functions.
A new computational method for determining the functions and properties of proteins is proposed by a team of specialists from Google Research (Cambridge, Massachusetts, USA) and the European Molecular Biology Laboratory of the European Bioinformatics Institute (EMBL-EBI).
They developed a deep learning neural network that predicts not only the function of a protein and its biological role in a cell, but also its structure and the functional effects of mutations (point changes in the amino acid sequence).
Performance of the ProteInfer neural network for predictions for all 7 major groups of enzymes, presented as precision-recall curves generated by changing the decision threshold at which the prediction is made. / ©Google research/ProteInfer
Using their algorithm, trained on Pfam’s worldwide database of annotated protein domain families, the researchers supplemented it with new annotations indicating the functions of proteins with a long-deciphered amino acid sequence.
As a result, the number of records in the database grew by almost 10%, including 360 new records about the functions of human proteins. According to the authors, this is the biggest Pfam update in the last 10 years.
The development of American bioinformaticians is designed to significantly simplify and accelerate the so-called drug design – the directed development of new drugs, taking into account the structure and three-dimensional structure of target molecules (often proteins) on which this drug will act.
In addition, knowledge of the structure of proteins and understanding of the mechanisms of their work will facilitate the development of new biotechnological enzymes for the food, chemical and energy industries.
Contact us: [email protected]