
Dialogue Scene
16:9A lawyer delivering a closing argument from the lectern of a wood-paneled courtroom, jury seated behind, synchronized speech
bytedance/seedance-1.5-proSeedance 1.5 Pro video generation API by ByteDance — cinema-quality video with synchronized audio, multilingual dialogue, and up to 1080p.
curl https://api.runbase.net/v1/runs \
-H "Authorization: Bearer $RUNBASE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "bytedance/seedance-1.5-pro",
"input": {
"prompt": "A cinematic product photo of a ceramic lamp",
"aspect_ratio": "1:1",
"resolution": "1K"
}
}'
A lawyer delivering a closing argument from the lectern of a wood-paneled courtroom, jury seated behind, synchronized speech

Extreme close-up of an older person's face by a window, a single tear forming, soft natural light

The tear rolls slowly down the cheek, subtle facial movement, ambient room tone
Seedance 1.5 Pro is ByteDance's first video model with native audio generation, bridging the gap between the silent 1.0 series and the full-featured 2.0 release. It produces cinema-quality video at up to 1080p with synchronized dialogue, sound effects, and ambient audio in multiple languages. The model accepts up to two reference images for image-to-video, allowing first-frame and last-frame control. Durations are fixed at 4, 8, or 12 seconds.
Dialogue-driven scenes — interviews, monologues, explainer videos with synced speech. Multilingual ad creatives where voiceover needs to match the visuals. Image-to-video with two reference frames to control both the start and end of a clip. Short narrative content with ambient sound design.
All parameters are passed in the input object of the run request.
| Parameter | Required | Description |
|---|---|---|
| prompt | Yes | Text description (3–2500 chars) |
| aspect_ratio | No | Default 16:9. Options: 16:9, 9:16, 4:3, 3:4, 1:1, 21:9 |
| resolution | No | Default 720p. Options: 480p, 720p, 1080p |
| duration | No | Default 4. Options: 4s, 8s, 12s |
| generate_audio | No | Generate synchronized audio. Default false |
| image_urls | No | Up to 2 reference images (max 10 MB each) for image-to-video |
Rather than quoting speech ("He says: Hello"), describe the scenario: "A man greets someone warmly at a doorstep, casual tone." The model infers appropriate dialogue from context.
Upload a first-frame image and a second image as the target end state. The model interpolates the motion between them, giving you tighter control over the arc of the clip.
The model generates multilingual dialogue and voiceover. Language is inferred from the prompt context — write your scene description in the target language or specify the language explicitly.
Seedance 2.0 supports arbitrary durations from 4 to 15 seconds, a 20000-character prompt limit, and generally higher visual fidelity. 1.5 Pro is limited to fixed 4/8/12s durations and 2500 characters. On Runbase, 1.5 Pro accepts up to two reference images (first and last frame), while 2.0 takes a single first-frame image.
Yes. The generate_audio parameter defaults to false. Leave it off to get silent video output, same as the 1.0 models.