Veo 3.1 Fast - Runbase

Input

OutputView all

Output will appear here

Pricing

720p

$0.33

1080p

$0.36

Examples

Cinematic Battlefield

16:9

First-person view soaring low over a medieval battlefield at dawn, gliding past clashing knights in armor, fire-lit arrows whizzing overhead, splintered catapults burning near fallen soldiers, flying inches above torn flags and mud-soaked ground, raw, terrifying, epic

Overview

Veo 3.1 Fast is the speed- and cost-optimized variant of Google DeepMind's Veo 3.1 video model. Like the rest of the Veo 3 family, it generates audio natively — producing synchronized dialogue, sound effects, and ambient audio alongside the visuals in a single pass. The model generates video at up to 1080p in two aspect ratios (16:9 and 9:16) and accepts up to two frame images for image-to-video generation, enabling first-frame and last-frame control.

Use cases

Cinematic content where audio is integral — nature documentaries with ambient sound, product reveals with impact effects. Social reels and vertical video with native sound design. Image-to-video with start and end frames for controlled animation arcs. Dialogue scenes and character-driven narratives with synchronized speech.

Inputs

All parameters are passed in the input object of the run request.

Parameter	Required	Description
prompt	Yes	Text description (1–5000 chars)
aspect_ratio	No	Default `16:9`. Options: `16:9`, `9:16`
resolution	No	Default `720p`. Options: `720p`, `1080p`
image_urls	No	Up to 2 frame images (max 10 MB each) for image-to-video

Prompt tips

Describe the soundscape in your scene

Veo 3 generates audio natively. Prompts that imply sound — "rain on a tin roof," "crowd cheering in a stadium," "whispered conversation" — produce richer, more immersive output than purely visual descriptions.

Use two images for motion arcs

Upload a first-frame image and a second image as the end state. Veo 3 interpolates between them, giving you control over both the starting composition and the final pose or framing.

Be specific about camera behavior

Phrases like "slow tracking shot," "static wide angle," or "handheld close-up" translate directly into camera movement. Vague prompts yield generic motion.

Limitations

Only 2 aspect ratios (16:9 and 9:16) — no square or ultrawide options
No duration parameter — the model determines clip length automatically
No resolution below 720p available
Generation time can be longer than competing models
No generate_audio toggle — the model does not expose an option to control audio output

FAQ

Can I disable audio generation?

No. Veo 3 Fast always generates audio natively and does not expose a generate_audio toggle to turn it off. (Some other models, such as ByteDance's Seedance, do provide a generate_audio switch.)

Can I control the video duration?

No. Veo 3 does not expose a duration parameter. The model determines clip length based on the prompt content. Typical outputs range from a few seconds to around 8 seconds.

How do the two frame images work?

The first image sets the opening frame; the second sets the target end state. The model generates video that transitions between them. You can also provide just one image to anchor the starting frame only.