With machines in the same language how computers “hear” music and why they use this skill

(ORDO NEWS) — Experiments with the generation of music using artificial intelligence began in the 50s of the last century.

Since then, neural networks have learned to “understand” and recognize songs, determine our tastes in streaming services, and even write music based on data on the movement of celestial bodies.

How artificial intelligence works with sound and what breakthrough products created by “cyber composers” we will use in the future, Yandex experts say.

How does a computer “see” sound?

In order for a machine to recognize or even write a melody, it needs to be familiar with hundreds of musical examples. But how to put sound inside a computer neural network, how can it perceive them?

Recall a ninth grade physics textbook: the source of sound is always an oscillating body. It could be a drum head, a taut string, or a speaker cone in our speakers.

The vibrations of these bodies are transmitted through the air in the form of many waves, fall on the human eardrum, irritate the endings of the auditory nerves, and we hear the sound.

In a car, the process is different. In order for it to “hear” music, it must be converted into a set of numbers that a computer can understand.

The conversion to a digital code occurs due to the processes of sampling and quantization. Information comes to the computer in the form of a continuous sound wave, “translated” into an electric current.

To describe it in the language of numbers, a special converter “cuts” this wave into very small segments a few milliseconds long.

This is discretization – the process when a continuous wave becomes divided into parts, that is, discrete. However, even in the smallest segments, the sound is a continuous piece of the wave.

For convenience, the converter “rounds” its values ​​measured during sampling and represents it as a point. This process is called quantization.

As a result, the whole wave turns into a set of points, to which it remains only to assign digital codes. In other words, we get a large set of numbers describing each piece of the song. These numbers can be entered into one large table, or can be displayed visually in the form of a spectrogram.

This is the name of the graphic representation of sounds in a coordinate system, where sound frequencies are located along the vertical axis (from high to low sounds), and time is located along the horizontal axis.

The warmer and brighter the color on the spectrogram, the more intense the sound at that location. It is not easy to describe the spectrogram in words, so it is better to immediately see how it looks.

With machines in the same language how computers hear music and why they use this skill 2

An even more illustrative explanation of this process is given as part of Yandex ‘s Digital Lesson, Digital Art: Music and IT.

In this project, the company’s experts talk about musical digitization, sound theory, recommendation systems in media services, and after the theoretical part, participants can try to continue the composition on their own. Lessons and tasks are intended for schoolchildren, but adults will also be interested in them.

How does AI recognize music and recommend songs?

The ability of artificial intelligence to represent music in digital code has found practical applications in our lives. It is thanks to spectrograms that neural networks analyze and recognize music in order, for example, to find similar tracks using Shazam.

When the application “hears” a composition, it compares its spectrogram with many others from the database and looks for matches. At the same time, artificial intelligence recognizes the melody even through interference such as voices at the next table or the noise of cars.

AI can also successfully recommend songs on streaming services. Imagine that we have just listened to our favorite track. To offer us the next song, neural networks compare our song with millions of others.

In particular, AI analyzes a large number of spectrograms, finds patterns in them, and determines with high accuracy whether two tracks are similar in genre, instruments used, and even in mood.

In fact, the same processes occur in the human brain: people distinguish between rock and jazz because they listened to a lot of different music and learned to identify the signs of a particular genre.

The difference is that AI does this in the language of numbers and is able to study the track much deeper than a person literally in thousands of parameters and remember not 100 melodies, but millions.

The composition comparison mechanism is also activated at the stage when developers need to upload an array of new songs to the application.

Prior to the official release, neural networks analyze the compositions with lightning speed and determine their main characteristics, so that immediately after the tracks are uploaded to the service, they are recommended to users who will definitely like such music.

In other words, at the moment the songs are loaded, the neural networks already know which tracks we are listening to and what features the new compositions have. By comparing these data with each other, AI decides whether to recommend a new song to us or not.

With machines in the same language how computers hear music and why they use this skill 3

In addition to the tracks we usually listen to, AI also takes into account data about our behavior. For example, information about what we watch on Kinopoisk will allow us to predict what we want to listen to on Yandex Music. Love westerns?

Hold Ennio Morricone’s The Good, the Bad and the Ugly. Our attitude to specific songs or performers can also be taken into account – how many times we liked compositions of a certain genre or a certain artist, and which songs, on the contrary, were rated negatively or switched at the moment when the neural network offered us to listen to them.

The history of interaction with music within the service is compiled into statistics that help AI figure out what is best to recommend to a particular person in a given period of time.

Another mechanism by which recommendations are formed is called collaborative filtering. Its essence is that people with similar interests are recommended similar compositions. If two people have similar musical tastes, and one of them is interested in some song, then the neural network will offer it to the second user.

“Now one of the most difficult tasks is to understand how to recommend unpopular tracks to people ,” says Daniil Burlakov, head of the Yandex Media Services recommendation products group. – If we have a super-famous song, based on the listening statistics, we can imagine which users will be interested in it.

But when we’re dealing with a track that only 100 people have listened to, it’s much harder to guess who else might like it. It’s a matter of data volume: unlike humans, machines need more information to learn.

If there is not a lot of it, there is only one way left – to teach neural networks to use the available data and do it more efficiently. A lot of efforts of programmers are now focused on this.”

Another difficult task is to understand whether it is possible to improve the recommendation system not only through analytics “inside” the services, but also thanks to data from the outside world. For example, is it possible to use information about the weather in cities to offer users music of a certain genre and mood.

Or whether the choice of music depends on the device on which the user is listening to songs at a particular moment. If not headphones are used, but, say, a speaker, does this mean that the person is now in the company?

And if he is in a company, does this mean that neural networks should be recommended more concise, streamlined and universal tracks?

With machines in the same language how computers hear music and why they use this skill 4

What about creating your own work?

The ability to “see” music in the format of a digital code and spectrograms helps artificial intelligence not only recommend similar tracks, but also compose their own.

“The process of “creativity” begins with learning: neural networks “show” a large number of musical works created by a person, it finds patterns in them, and then, based on this data, creates something similar ,” says Anatoly, head of the Yandex media services technology development service. Starostin.

“ For example, in 2019, the Yandex neural network analyzed 4 GB of classical music from Bach to Schnittke and wrote a piece , which was subsequently performed by the New Russia Orchestra.”

Last year, the team set itself an even more ambitious goal – to translate astronomical data about celestial objects into music. So the album “Music of the Stars” was born.

First, astrophysicists gave developers information about cosmic bodies: brightness, periodicity of certain phenomena, motion parameters. These data were summarized in tables and then converted into musical notation.

“Any tabular information, for example, about a change in a parameter over time, can be translated into a musical sequence, because notes are, in fact, a table. So astrophysicists have provided us with information about the change in solar activity and the number of spots on it since 1960.

We assigned notes to this data, and thus a melody was obtained, – explains music producer Timur Khaziev. “ After that, we only had to artistically refine the compositions in order to convey the character of each object.

A black hole is something tragic, mystical. The sun is something warm. This is how the album about space, timed to coincide with Cosmonautics Day, turned out.

With machines in the same language how computers hear music and why they use this skill 5

And the company’s specialists also teach neural networks to write personalized tracks. Within the framework of one project, programmers create algorithms for generating compositions that can solve specific human problems.

For example, help to concentrate before starting work, recharge for sports or, conversely, calm down. Artificial intelligence composes these melodies from a large library of sounds: recordings of individual instruments, effects, and even vocal parts.

“The music that the neural network generates is endless, it does not stop. That is, a person does not need to be distracted by pauses between tracks, by changing dynamics or mood. It helps people to get in the right mood, ” says Timur Khaziev.

For example, music for jogging is generated at a tempo of 160 bpm, which corresponds to the average human running speed. And this pace does not change, the user does not have to adjust and stray from the usual beat. All you have to do is select a “problem” and the algorithm will generate the right track.”

What awaits us in the future?

In the future, according to experts, with the help of neural networks it will be possible to create melodies for music therapy, since music has a strong effect on the human brain. It helps to relax, tone up or, for example, distract from bad thoughts.

“If I allow myself to get creative and go even further, I see not just a smart music therapy app, but entire adaptive homes.

It would be great to have systems capable of scanning the emotional state of a person as soon as he stepped over the threshold of his apartment in order to turn on the necessary lighting, change the color of the walls, and select the appropriate musical accompaniment against the background.

I believe that music has a strong influence on human health. If you understand exactly how to use it for good, the track generation technology will become indispensable , ”says Timur Khaziev.

In the world of music, neural networks can already do a lot: recognize, recommend and generate compositions. In many ways, this was achieved thanks to the “translation” of sound from human language to neural network.

The difficulty now to be solved is that the machines clearly follow the loaded algorithm. People sometimes deviate from the rules. Thanks to this, they find among a million tracks something radically different from their musical preferences, but sinking into the soul, or they write brilliant works, forgetting about all the canons.

But it is very difficult to explain to a computer when and what instructions to ignore. Perhaps it is in this direction that the technologies for applying AI in music will develop, which means that an even more exciting future awaits us.


Contact us: [email protected]

Our Standards, Terms of Use: Standard Terms And Conditions.