Veo 3 can generate movies — and soundtracks to associate with them

Google’s newest video-generating AI mannequin, Veo 3, can create audio to associate with the clips that it generates.

On Tuesday through the Google I/O 2025 developer convention, Google unveiled Veo 3, which the corporate claims can generate sound results, background noises, and even dialogue to accompany the movies it creates. Veo 3 additionally improves upon its predecessor, Veo 2, when it comes to the standard of footage it may well generate, Google says.

Veo 3 is accessible starting Tuesday in Google’s Gemini chatbot app for subscribers to Google’s $249.99-per-month AI Extremely plan, the place it may be prompted with textual content or a picture.

“For the primary time, we’re rising from the silent period of video technology,” Demis Hassabis, the CEO of Google DeepMind, Google’s AI R&D division, mentioned throughout a press briefing. “[You can give Veo 3] a immediate describing characters and an surroundings, and counsel dialogue with an outline of the way you need it to sound.”

The huge availability of instruments to construct video turbines has led to such an explosion of suppliers that the house is changing into saturated. Startups together with Runway, Lightricks, Genmo, Pika, Higgsfield, Kling, and Luma, in addition to tech giants akin to OpenAI and Alibaba, are releasing fashions at a quick clip. In lots of circumstances, little distinguishes one mannequin from one other.

Audio output stands to be a giant differentiator for Veo 3, if Google can ship on its guarantees. AI-powered sound-generating instruments aren’t novel, nor are fashions to create video sound effects. However Veo 3 uniquely can perceive the uncooked pixels from its movies and sync generated sounds with clips robotically, per Google.

Right here’s a pattern clip from the mannequin:

Veo 3 was possible made doable by DeepMind’s earlier work in “video-to-audio” AI. Final June, DeepMind revealed that it was creating AI tech to generate soundtracks for movies by coaching a mannequin on a mix of sounds and dialogue transcripts in addition to video clips.

DeepMind gained’t say precisely the place it sourced the content material to coach Veo 3, however YouTube is a robust chance. Google owns YouTube, and DeepMind beforehand informed TechCrunch that Google fashions like Veo “might” be educated on some YouTube materials.

To mitigate the chance of deepfakes, DeepMind says it’s utilizing its proprietary watermarking know-how, SynthID, to embed invisible markers into frames Veo 3 generates.

Whereas firms like Google pitch Veo 3 as highly effective artistic instruments, many artists are understandably cautious of them — they threaten to upend whole industries. A 2024 study commissioned by the Animation Guild, a union representing Hollywood animators and cartoonists, estimates that greater than 100,000 U.S.-based movie, tv, and animation jobs can be disrupted by AI by 2026.

Google additionally as we speak rolled out new capabilities for Veo 2, together with a characteristic that lets customers give the mannequin photographs of characters, scenes, objects, and kinds for higher consistency. The most recent Veo 2 can perceive digicam actions like rotations, dollies, and zooms, and it permits customers so as to add or erase objects from movies or broaden the frames of clips to, for instance, flip them from portrait into panorama.

Google says that every one of those new Veo 2 capabilities will come to its Vertex AI API platform within the coming weeks.