DeepMind Google Launches New AI Tool To Create Video Soundtracks

JAKARTA Google DeepMind introduces a new AI tool to generate video soundtracks. This tool not only uses text as a prompt to produce audio, but also pays attention to video content.

According to DeepMind, by combining these two elements, users can use this tool to create scenes with "dra drama scores, realistic voice effects, or dialogues that match the characters and video tones." Some examples can be seen on DeepMind websites, which show quite satisfactory audio results.

For example, for videos of cars driving through the city of cyberpunk, Google uses the prompt "gliding car," car engines roar, electronic music of the angel" to produce audio. The sound of tires sliding is synchronized with the motion of the car. Another example creates an underwater sound landscape using the "fertilizer pulsed underwater, marine life, ocean."

Although users can include prompt text, DeepMind says it's optional. Users also don't need to match the generated audio to the exact scene in detail. According to DeepMind, this tool can generate an "infinite number" soundtrack for videos, allowing users to create an unlimited audio stream.

This can make it stand out from other AI tools, such as the sound effect generator of ElevenLabs which uses prompt text to produce audio. This tool can also make it easier to pair audio with videos generated by AI from tools such as DeepMind's Veo and Sora (the last one will combine audio in the future).

| TEKNOLOGI
Skandal Boeing: Pesawat Antariksa Starliner Gagal Lepas Landas dari ISS
17 Juni 2024, 03:10
| TEKNOLOGI
Dua Mumi 'Alien' Baru dari Peru Diperkenalkan, Dikirim ke AS untuk Tes DNA
17 Juni 2024, 02:05
| TEKNOLOGI
Pengguna Twitter Marah dengan Perubahan Besar yang Dilakukan Elon Musk di X
17 Juni 2024, 01:03
| TEKNOLOGI
Apple dan Meta Dituduh Tidak Mematuhi Aturan UE
17 Juni 2024, 00:05
| TEKNOLOGI
Apple iOS 18: Hadirkan Pembaruan Besar dengan AI dan Fitur Baru Genmoji
16 Juni 2024, 23:45

DeepMind says they train its AI tools using video, audio, and annotations that contain "detailed descriptions of sound and transcripts of the spoken dialogue." This allows video-to-audio generators to match audio events with visual scenes.

This tool still has some limitations. For example, DeepMind is trying to improve its ability to sync lip movements with dialogue, as seen in the video family clay. DeepMind also notes that this video-to-audio system depends on the quality of the video, so that blurry or distorted videos "can cause a real decrease in audio quality."

This DeepMind tool is not yet available in general because it still has to undergo "strict security assessment and testing." When available, its audio output will include Google's Synthed watermark to signal that it was generated by AI.

The English, Chinese, Japanese, Arabic, and French versions are automatically generated by the AI. So there may still be inaccuracies in translating, please always see Indonesian as our main language. (system supported by DigitalSiber.id)

Tag: google artificial intelligence soundtrack