Gemini Omni Flash
Gemini Omni Flash Video Generation
- Google’s official Gemini Omni Flash all-in-one multimodal video generation model
- Supports Text-to-Video, Image-to-Video, and Video-to-Video (editing), with mixed text + image + video input
- Outputs 720p / 24fps, 3-10 seconds, with audio; supports conversational multi-turn editing
- Asynchronous task API. Submit a task first, then query the result by task ID
POST
Authentication
All requests require Bearer Token authentication.Get an API key:Visit the API Key management page to get your API key.Add the following header when making requests:
Request Parameters
Video generation model name. Must be
gemini-omni-flash-preview.Text instruction. For Text-to-Video, it is a scene description; for Image/Video-to-Video, it is an action / style / editing instruction.
prompt and reference materials (image_urls / video_urls) — provide at least one of them.Reference images, up to 16. Each item is an
http(s):// URL.Supports JPEG / PNG. For multiple subjects (e.g. “cat + ball of yarn”), you can pass multiple images and describe how they interact in the prompt.Reference / video to be edited, at most 1 (multiple video references are not supported). Can be an
http(s):// direct link or data:video/....Video aspect ratio, which actually controls the output frame orientation.Supported values only:
16:9- landscape (default)9:16- portrait
16:9.Video resolution. Currently only
720p is supported.Previous task ID: fill in the **
task_id** of the previous generation task.Response
Response status code. Successful requests return
200.Returned task array.
Query Task Result
Video generation is asynchronous. After submission, the API returns atask_id. Use the Get task status endpoint to query progress and results.