Google Deepmind has revealed another AI V2A to generate audio: Know more

The Google Research Center Google Deepmind has revealed another AI that can generate Audio for video using text prompts.

By Grabyoume

June 20, 2024, 4:41 pm

Highlight

Google Deepmind has revealed another AI that can generate Audio for video using text prompts. This new model is called the V2A (Video-to-Audio)
This model can synchronize and generate audio for video with video pixel and text prompts
Currently under more research before release to the public

The Google Research Center Google Deepmind has revealed another AI that can generate Audio for video using text prompts. This new model is called the V2A (Video-to-Audio). This model can give life to videos that didn’t have audio-generating soundtracks and dialogues for videos.

V2A (Video-to-Audio) Model

You have heard above the Text-to-Video Model like Sora of Open AI and VEO of Google that generate video using text prompts but this model generates video only without any sounds. This new technology will allow you to generate and synchronize audio to video using text and video pixels.

V2A can generate sound for video adding sound that matches the video visual like dramatic scores, realistic sound effects, or fitting dialogue that matches the character and mood of the video. This can be a great tool when paired with video-generated models like VEO.

With text prompts, AI gets insight into the mood and type of sounds suitable for the video provided, and you can repeat this until users get the desired sound that matches the video.

How does V2A (Video-to-Audio) work?

In a blog post, Google explains that it begins with the compression of video input. Then, the diffusion model refines the audio and generates sounds from random noise. This is guided by a visual input and text prompt given to generate synchronized, realistic audio that closely aligns with the video and the prompt text given. Lastly, the audio output is decoded, turned into an audio waveform, and combined with the video data.

V2A process diagram| photo: Google Deepmind

This model can even synchronize audio to video that aligns with the video by adjusting various elements such as sound effects, visuals, and timing to ensure they match perfectly. Without the need for humans to arrange audio to match the video visuals.

Also Read: Google Gemini App finally available on Andriod devices in India

V2A safety concerns

One of the concerns things is AI safety and Google is committed to making this model safe for people and communities. Google takes the concerns from various fields like leading creators and filmmakers, to get feedback during the research and development process.

To address the potential misuse of these models Google incorporated with SynthID toolkit into V2A research watermark all AI-generated content.

Conclusion

Google will use advanced technology to make this process of generating audio and synchronize it with the video to be accurate and match perfectly with the video mood and users’ preferences. However, this model is under research for further development and currently has limits that can be addressed by research and continued development before it is opened to the public.

Google Deepmind has revealed another AI V2A to generate audio: Know more

Highlight

V2A (Video-to-Audio) Model

How does V2A (Video-to-Audio) work?

V2A safety concerns

Conclusion

Related Articles

Samsung Plans to work on Rollable Smartphone: Report

Samsung One UI 7 beta Tested by Subsidiaries: Report

Citizens in Meghalaya Fall Victim to International Ponzi Scheme

iPad (2021) will be on sale for Rs 18,999 during Flipkart Big Billion Days Sale: Should you buy it?

WhatsApp for Android will let Users Choose Public Figures for Meta AI Voice Mode: Reported