Google Develops MusicLM AI That Can Produce Music Works in Minutes from Prompt Texts
JAKARTA - Google researchers have created an AI that can produce a few minutes of musical works from a text prompt. Even the AI can change whistling or hum melodies to other instruments.
This is similar to the way systems like DALL-E generate images from a write prompt. The model is called MusicLM, and while you can't play around with it yourself, the company has uploaded lots of samples produced using the model.
MusicLM is Google's generative music model that uses deep learning to generate new music. This process involves training a model with existing music data, such as scores, songs and audio, to understand patterns and concepts in music.
After that, the model can create new music by combining patterns and concepts taken from the training data. MusicLM accepts user input such as genre, genre, and key tones, and uses this information to generate new music according to user specifications.
The examples are impressive. There are 30-second snippets that sound like actual songs made up of paragraph-long descriptions that define specific genres, vibes, and even instruments, as well as five-minute pieces that result from a word or two like “melodic techno. ”
Also featured on their demo site are examples of what models produce when asked to produce 10-second clips of instruments such as cellos or maracas, eight-second clips of certain genres, music suitable for prison escapes, and even what a novice piano player sounds like versus proficient. It also includes interpretations of phrases such as "futuristic club" and "accordion death metal".
MusicLM can even simulate human vocals, and while it seems to get the overall tone and voice right, there's definitely a quality to it. The best way this can be described is that the sound is rough or static.
AI-generated music has a long history dating back several decades. There is a system that has been credited with making pop songs, impersonating Bach better than any human could in the 90s, and accompanying live performances.
One recent version uses StableDiffusion's AI image generation engine to turn text commands into spectrograms which are then turned into music. The paper said that MusicLM could outperform other systems in terms of "quality and adherence to texts", as well as the fact that it could receive audio and transcribe melodies.
SEE ALSO:
That last bit is probably one of the coolest demos the researchers put out. This site lets you play the input audio, where someone is humming or whistling a song, then lets you hear how the model reproduces it as an electronic synth lead, string quartet, guitar solo, etc..
Like other attempts at this type of AI, Google has been significantly more careful with MusicLM than some of its peers with similar technology. "We have no plans to release a model at this point," the paper said, quoted by The Verge. This creates a risk of “potential creative content abuse” or plagiarism and the potential for cultural appropriation or misrepresentation.
There's always a chance the technology could show up in one of Google's fun music experiments at some point, but for now, the only people who can benefit from this research are other people building music AI systems.
Google says it has publicly released a dataset with around 5.500 music-text pairs, which can be helpful when training and evaluating other music AIs.