Google researchers have created an AI that may generate musical items from textual content inputs – very similar to how Chat GPT can flip a textual content command right into a tale and DALL-E generates photographs from written activates. The AI program can flip textual content enter into seconds, or even minutes-long track, in addition to flip, hummed melodies into different tools.
As consistent with analysis revealed on Githubthe AI fashion is named MusicLM, and the corporate has uploaded a string of samples that it produced the use of the fashion. The samples are referred to as MusicCaps and are mainly a dataset composed of five.5k music-text pairs, with wealthy textual content descriptions equipped by means of human professionals.
“We introduce MusicLM, a model generating high-fidelity music from text descriptions such as ‘a calming violin melody backed by a distorted guitar riff’. MusicLM casts the process of conditional music generation as a hierarchical sequence-to-sequence modeling task, and it generates music at 24 kHz that remains consistent over several minutes,” the corporate mentioned within the analysis revealed.
Google’s AI creates 5-minute melodies
The examples come with 30-second clips in addition to long-form track of five mins that sound like precise songs. They’ve been created by means of paragraph-long descriptions, and the extra transparent the directions are, the easier the track is. Furthermore, the examples additionally come with style, vibe, or even explicit tools.
“The audio is generated by means of offering a series of textual content activates. These affect how the fashion continues the semantic tokens derived from the former caption,” the researchers said.
Story mode
There is also a “tale mode” demo where the model is basically given multiple text inputs with time duration for each type of music that needs to be created.
Take this prompt, for example:
time to meditate (0:00-0:15)
time to wake up (0:15-0:30)
time to run (0:30-0:45)
time to give 100% (0:45-0:60)
“Our experiments show that MusicLM outperforms previous systems both in audio quality and adherence to the text description. Furthermore, we demonstrate that MusicLM can be conditioned on both text and a melody in that it can transform whistled and hummed melodies according to the style described in a text caption,” the researchers famous.
As consistent with analysis revealed on Githubthe AI fashion is named MusicLM, and the corporate has uploaded a string of samples that it produced the use of the fashion. The samples are referred to as MusicCaps and are mainly a dataset composed of five.5k music-text pairs, with wealthy textual content descriptions equipped by means of human professionals.
“We introduce MusicLM, a model generating high-fidelity music from text descriptions such as ‘a calming violin melody backed by a distorted guitar riff’. MusicLM casts the process of conditional music generation as a hierarchical sequence-to-sequence modeling task, and it generates music at 24 kHz that remains consistent over several minutes,” the corporate mentioned within the analysis revealed.
Google’s AI creates 5-minute melodies
The examples come with 30-second clips in addition to long-form track of five mins that sound like precise songs. They’ve been created by means of paragraph-long descriptions, and the extra transparent the directions are, the easier the track is. Furthermore, the examples additionally come with style, vibe, or even explicit tools.
“The audio is generated by means of offering a series of textual content activates. These affect how the fashion continues the semantic tokens derived from the former caption,” the researchers said.
Story mode
There is also a “tale mode” demo where the model is basically given multiple text inputs with time duration for each type of music that needs to be created.
Take this prompt, for example:
time to meditate (0:00-0:15)
time to wake up (0:15-0:30)
time to run (0:30-0:45)
time to give 100% (0:45-0:60)
“Our experiments show that MusicLM outperforms previous systems both in audio quality and adherence to the text description. Furthermore, we demonstrate that MusicLM can be conditioned on both text and a melody in that it can transform whistled and hummed melodies according to the style described in a text caption,” the researchers famous.