Deep machine learning uses the language of proteins to predict their properties

(ORDO NEWS) — Deep learning models have proven themselves well when working with texts and speech. However, they are also effective for solving problems in molecular biology and biomedicine, including the prediction of the functional properties of proteins based on their amino acid sequence.

Over the years, bioinformaticians, geneticists, neurophysiologists, and other specialists in the life sciences have continued to elucidate the biological functions of genes and their products, proteins.

- Advertisement · Scroll to continue -

To do this, they have to use large and sometimes complex data that simply cannot be handled without the help of machine learning and data analysis.

Recall that proteins are large biological molecules with a complex structure. They are long chains (polymers) consisting of many linked amino acid units (monomers).

Proteins can perform a wide variety of very specific functions, from forming the “cellular skeleton” to catalyzing chemical reactions, acting as “molecular machines” and regulating various biological processes.

- Advertisement · Scroll to continue -

This is possible due to their special three-dimensional structure, which, in turn, is determined precisely by the amino acid sequence of the protein.

At the same time, establishing a relationship between the amino acid sequence, the structure of a protein, and its functions is a difficult and far from solved problem.

Therefore, researchers from three different universities in Turkey published a paper in the journal Nature Machine Intelligence , in which they evaluated the possibility of using deep learning models, originally intended for linguistic analysis.

Deep learning is a type of machine learning based on neural networks. It is called deep because the structure of its networks consists of several input, output and hidden layers of neurons located between them. The authors of the new publication considered both the strengths of this approach and its disadvantages.

“The data obtained using molecular biology can be represented in the form of a language (in fact, the language of genes / proteins) in such a way that the sequence of a gene or protein turns out to be something like a sentence in a natural language that has a certain meaning,” said one of the authors, Tuncha Dogan (Tunca Dogan).

- Advertisement · Scroll to continue -

He believes that the meaning of such a “language of proteins” comes down to the special biological, physical and chemical properties of these biomolecules.

“In accordance with this, the goal of the work was to build machine learning models that use a vector representation in multidimensional space borrowed from language models (high dimensional numerical embeddings.) for proteins as input data and which accurately predict their functional properties”.

To successfully evaluate protein language models and their performance metrics, researchers first had to prepare large, robust datasets. Each of these sets has a certain “level of difficulty”.

With this method, the Turkish scientists were able to evaluate the suitability of different “language modeling” architectures (including BERT, T5, XLNet, and ELMO) for identifying hidden patterns in protein sequences.

The researchers believe that these seemingly invisible properties of the sequences provide valuable information about the functional characteristics of proteins.

“Perhaps the most remarkable result was that these deep learning models were able to successfully establish the functional properties of proteins based solely on the amino acid sequence, although this is quite a difficult task.

It is also in good agreement with the results of other recent structure prediction studies (eg Deepmind’s AlphaFold2 and Baker’s RoseTTAFold), which used the sequence as input,” added Dogan.

The new approach and techniques similar to it may have many practical applications, including the development of personalized therapies.

—

Online:

Our Standards, Terms of Use: Standard Terms And Conditions.

Deep machine learning uses the language of proteins to predict their properties

The Latest News

Netherlands will consider resuming support to Palestinian UNRWA agency

Taiwan rattled by quakes again, no immediate reports of damage

Biden drops menthol cigarette ban to court black voters – WSJ

‘Our Europe could die,’ Macron says. Who’s the killer?

Planned solar panel manufacturing plant to employ over 900 in eastern NC

Paramedic who injected Elijah McClain with ketamine before his death avoids prison

Recommended

Scientists discovered an asteroid that is capable of destroying the Earth

Scientists have issued a warning regarding the potential collapse of the Atlantic current

Scientists have explored whether vitamin B17 provides protection against cancer

Scientists discovered the true builders of the Egyptian pyramids

1 in 50 Brits claim UFO abductions with intergalactic encounters – poll

Ukrainian military captures footage of disc-shaped UFO in warzone

Top Editors

Information you can trust

GAMES

SPORTS

Trending Searches

Browse

The Voice

Your Page

Stay Informed

Follow Social

Connect

Privacy

About

Useful

Sponsored

Shop