Runbase

Command Palette

Search for a command to run...

Google

Veo 3.1 Fast

ID:google/veo-3

Veo 3.1 Fast — Google DeepMind's fast, cost-efficient Veo 3.1 video model with native audio (dialogue and sound effects), image-to-video, and up to 1080p resolution.

Text to videoImage to videoAudio generation1080p
Input
Aspect ratio
Resolution
Frame images
Max 2 images, 10MB each
OutputView all
Output will appear here
720p
$0.33
1080p
$0.36
cURL
curl https://api.runbase.net/v1/runs \
  -H "Authorization: Bearer $RUNBASE_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "google/veo-3",
    "input": {
      "prompt": "A cinematic product photo of a ceramic lamp",
      "aspect_ratio": "1:1",
      "resolution": "1K"
    }
  }'

Examples

Cinematic Battlefield

Cinematic Battlefield

16:9

First-person view soaring low over a medieval battlefield at dawn, gliding past clashing knights in armor, fire-lit arrows whizzing overhead, splintered catapults burning near fallen soldiers, flying inches above torn flags and mud-soaked ground, raw, terrifying, epic

Overview

Veo 3.1 Fast is the speed- and cost-optimized variant of Google DeepMind's Veo 3.1 video model. Like the rest of the Veo 3 family, it generates audio natively — producing synchronized dialogue, sound effects, and ambient audio alongside the visuals in a single pass. The model generates video at up to 1080p in two aspect ratios (16:9 and 9:16) and accepts up to two frame images for image-to-video generation, enabling first-frame and last-frame control.

Use cases

Cinematic content where audio is integral — nature documentaries with ambient sound, product reveals with impact effects. Social reels and vertical video with native sound design. Image-to-video with start and end frames for controlled animation arcs. Dialogue scenes and character-driven narratives with synchronized speech.

Inputs

All parameters are passed in the input object of the run request.

ParameterRequiredDescription
promptYesText description (1–5000 chars)
aspect_ratioNoDefault 16:9. Options: 16:9, 9:16
resolutionNoDefault 720p. Options: 720p, 1080p
image_urlsNoUp to 2 frame images (max 10 MB each) for image-to-video

Prompt tips

Describe the soundscape in your scene

Veo 3 generates audio natively. Prompts that imply sound — "rain on a tin roof," "crowd cheering in a stadium," "whispered conversation" — produce richer, more immersive output than purely visual descriptions.

Use two images for motion arcs

Upload a first-frame image and a second image as the end state. Veo 3 interpolates between them, giving you control over both the starting composition and the final pose or framing.

Be specific about camera behavior

Phrases like "slow tracking shot," "static wide angle," or "handheld close-up" translate directly into camera movement. Vague prompts yield generic motion.

Limitations

  • Only 2 aspect ratios (16:9 and 9:16) — no square or ultrawide options
  • No duration parameter — the model determines clip length automatically
  • No resolution below 720p available
  • Generation time can be longer than competing models
  • No generate_audio toggle — the model does not expose an option to control audio output

FAQ

Can I disable audio generation?

No. Veo 3 Fast always generates audio natively and does not expose a generate_audio toggle to turn it off. (Some other models, such as ByteDance's Seedance, do provide a generate_audio switch.)

Can I control the video duration?

No. Veo 3 does not expose a duration parameter. The model determines clip length based on the prompt content. Typical outputs range from a few seconds to around 8 seconds.

How do the two frame images work?

The first image sets the opening frame; the second sets the target end state. The model generates video that transitions between them. You can also provide just one image to anchor the starting frame only.