AI Video Models in 2026: Seedance 2.0 Takes the Lead

The video landscape changed hard in the first quarter of 2026. Kling released version 3.0. Veo 3.1 added 4K and scene extension. Runway shipped Gen-4.5. ByteDance released Seedance 2.0 — and Seedance 2.0 has since pulled clear of the pack. It now holds the #1 Elo rating on the Artificial Analysis Video Arena across both text-to-video (1,269) and image-to-video (1,351), ahead of Kling 3.0, Veo 3, and Runway Gen-4.5. Sora announced its own closure.

Seven models matter right now. One of them — Seedance 2.0 — is the current all-round leader. The others are specialists with real strengths you still want when the brief demands them. This is the short version of which to pick, when, and why.

The quick version

If you want the answer before the explanation:

Strength	Model	Why
Current all-round leader	Seedance 2.0	#1 on the Artificial Analysis Video Arena. Native audio+video in a single pass, phoneme-level lipsync, up to nine image references, multi-shot in one generation. Western access via Flora.
Closest replacement for Sora’s multi-shot look	Kling 3.0	Native 4K at 60fps, 15s clips, native audio. The cleanest one-to-one move from Sora.
Dialogue specialists and free tier	Veo 3.1	Strong cinematic aesthetics, scene consistency, and a generous free tier (10 clips/month). Google’s Gemini app and Flow.
Narrative, directed, film-look	Runway Gen-4.5	Mature web interface. Best control tools for directed, multi-step edits.
Character close-ups and micro-expressions	MiniMax Hailuo 2.3	Six-second cap by design. Short, dense, beautifully animated.
Long-form generation via extension	Luma Ray3.14	Up to 18 seconds per clip. Three times cheaper than Ray3.
Leaving the field	Sora 2	OpenAI closes the app on April 26, 2026. Migration guide here.

Now the detail.

Seedance 2.0 (ByteDance) — the current leader

ByteDance released Seedance 2.0 in February 2026. The rollout was complicated. The model itself is not.

It now holds the top Elo rating on the Artificial Analysis Video Arena — 1,269 for text-to-video, 1,351 for image-to-video. Ahead of Kling 3.0, Veo 3, Runway Gen-4.5. When people speak of a “current king” in video generation, this is the model they mean.

What Seedance does best: All the things at once, and none of them as bolt-ons. Audio and video generated together in a single pass, not a silent clip scored afterwards. Phoneme-level lipsync. Multi-shot sequences inside one generation with natural cuts. Up to nine reference images, three video clips, and three audio files in a single prompt — the most multimodal input of any widely available video model. Physical realism and body mechanics that hold up through cloth, water, weight, momentum. Multilingual dialogue including English, Mandarin, Japanese, Korean, Spanish, Portuguese, and Indonesian.

It produces clips up to fifteen seconds at 1080p. That is shorter than the Luma Ray3.14 cap of eighteen seconds, but fifteen seconds of multi-shot work is usually more than a single long take.

Where it falls short: Availability. ByteDance paused the global launch of Seedance 2.0 in mid-March 2026. CapCut integration rolled out only to Brazil, Indonesia, Malaysia, Mexico, the Philippines, Thailand, and Vietnam. The Dreamina app is restricted to similar markets. Direct API access is limited to fal.ai.

For UK, European, and US creators, the practical route is through Flora Fauna, which carries both Seedance 2.0 and Seedance 2.0 Fast. Flora is currently the cleanest way to use this model if you are outside the live regions. Content moderation is also strict — no real recognisable faces, no trained franchise characters, limited face-led advertising work. Worth knowing before you commit a brief to it.

Access: Flora Fauna (recommended for Western creators), fal.ai API, Runway’s Unlimited and Enterprise plans outside the US, Dreamina and CapCut in select markets.

Deep dive: Directing Seedance 2.0 — the multimodal prompt guide, with the @-tag grammar, nine directing techniques, the keyword palette, and sixteen ready-to-use templates.

Kling 3.0 (Kuaishou)

Kuaishou released Kling 3.0 on February 5, 2026. Reports from post-Sora migration coverage suggest it is one of the models working creators are reaching for most — and in our testing it is the cleanest one-to-one replacement for Sora’s multi-shot work.

What Kling does best: Multi-shot composed sequences with integrated audio. Fifteen seconds per generation, native 4K at 60 frames a second, and audio — dialogue, lipsync, sound design — generated alongside the video rather than added after the fact. Kling 3.0 is also unusually good at persisting a world across shots, which is the thing that made Sora feel more like film than flipbook.

Where it falls short: Fifteen seconds is not sixty. For longer sequences you still need to plan a shot list and assemble in a timeline. The audio is generated, which means it is approximate — you will still want to do a post-production pass. And Kling’s pricing is credit-based rather than a clean monthly-tier list, which takes a little time to read if you are coming in cold.

Pricing: Credit-based via klingai.com. No single flat per-clip price — consumption depends on resolution, duration, and variant.

Access: klingai.com (web), Flora Fauna (which carries Kling 3.0 Pro and Standard alongside earlier Kling versions), and through API partners.

Veo 3.1 (Google)

Google released Veo 3.1 in October 2025 and the cost-reduced Veo 3.1 Lite on the Gemini API in March 2026. The version you use in the Gemini app or in Flow is the full Veo 3.1.

What Veo does best: Cinematic aesthetics, prompt understanding, and scene consistency. Audio is generated at the same time as the image — synchronised dialogue, sound effects, and ambient soundscapes. For a while Veo was the only model doing this cleanly; Seedance 2.0 now matches it on unified audio+video and arguably leads on phoneme-level lipsync precision. What Veo still holds is the broadcast-ready cinematic look and the most forgiving free tier in the field — ten generations a month on any personal Google account. Full Veo workflow guide here.

Where it falls short: Clips are four, six, or eight seconds per generation. You can extend a scene iteratively to build longer sequences, but you cannot ask for a single thirty-second shot. Character consistency across a full scene is good but not perfect — expect to regenerate a few times to land the exact look you want.

Resolution: Up to 4K, added in January 2026. Vertical 9:16 for mobile-first formats is also supported.

Pricing: Free tier via Google Vids (10 generations a month on any personal Google account). Google AI Pro at $19.99 a month. Google AI Ultra at $249.99 a month. API pricing from $0.05 per second on Veo 3.1 Lite up to $0.40 per second on Veo 3.1 Standard.

Access: The Gemini app, Flow (flow.google), Google Vids, the Gemini API, and Flora Fauna (which carries Veo 3.1, Fast, Lite, and Frames variants).

Runway Gen-4.5

Runway released Gen-4.5 on December 1, 2025. It is the model most closely associated with directed, narrative work — the kind of film you sit down to plan.

What Runway does best: Control. Gen-4.5 has the most mature set of what Runway calls “control modes” — image-to-video (animate a still), keyframes (pin specific moments the generation must hit), video-to-video (feed it an existing clip and restyle it), and a growing set of directed-generation tools. If your project is a short film or a music video or anything where the visual language is the whole point, Runway is the model that most looks like it was directed.

Where it falls short: Native audio was added to Gen-4.5 in December 2025 — dialogue, sound effects, and ambient — but Veo 3.1 and Kling 3.0 still lead on lip-sync precision. Clips are shorter than Kling — typically in the range of a few seconds to around ten per generation. Pricing is per-credit on top of the monthly plan, which can add up fast on production work.

Pricing: Monthly plans start around $15, with higher tiers increasing credit allocations. Annual billing reduces the monthly rate.

Access: runwayml.com (web), API, and Flora Fauna. Runway Aleph — Runway’s newer editing-focused tool — is also carried on Flora as a separate model.

MiniMax Hailuo 2.3

Hailuo is the outlier on this list. Six-second clips, by design. The MiniMax team has been explicit that the cap is intentional.

What Hailuo does best: Short, dense, emotionally precise moments. Character close-ups, micro-expressions, small gestures. If you need the exact beat of a laugh, a glance, a breath — the kind of shot that is six seconds because six seconds is all it needs to be — Hailuo 2.3 is often the best choice. Its Media Agent feature lets you hand it an image, a clip, or a sound to shape what it makes.

Where it falls short: Six seconds. That is not a weakness so much as a philosophical choice. You cannot use Hailuo for a thirty-second establishing shot. It is built for the single beat, not the sequence.

Pricing: Matched to the previous Hailuo 02 generation. Credit-based via hailuoai.video.

Access: hailuoai.video, Flora Fauna (carries Hailuo 2.3 Pro), and partner APIs.

Luma Ray3.14

Luma released Ray3.14 on January 26, 2026. It is the upgrade to Luma’s Ray 3 line, and notable for being significantly cheaper — Luma’s own announcement cites three times lower cost than Ray 3.

What Luma does best: Longer generations and flexible editing. Up to eighteen seconds per clip, with image-to-video using first-and-last-frame references (you give it the opening and closing images, and it generates the motion between). Native 1080p. If you want a single long take rather than a multi-shot assembly, Luma is worth trying.

Where it falls short: Audio is not confirmed as a native feature on Ray3.14 — you will add sound in post. And Ray3.14 is not on Flora Fauna as of April 2026. Flora currently carries Luma Ray 2 and Ray 2 Flash only — so if you use Flora as your main workspace, you will not have the newest Luma inside it.

Pricing: Credit-based via lumalabs.ai with a free tier for experimentation.

Access: lumalabs.ai (web), Dream Machine app, partner APIs.

Sora 2 — the one leaving

OpenAI announced on March 24, 2026 that the Sora app will shut down on April 26, 2026, with the API following on September 24, 2026. There is no successor product. Sora will continue as internal research on world models, but the thing you use to make videos is going away.

If you have Sora work or were planning to use it this month, read the migration guide for export instructions and the cleanest moves to Kling, Veo, or Runway.

How to choose

If you want one answer: Seedance 2.0 — if you can get at it. It is the current leader by a clear margin on public benchmarks, it handles audio and multi-shot in a single generation, and for most briefs it will produce a better first-pass result than anything else in this list. The catch is availability; Flora is the cleanest route for Western creators, and the content moderation is strict.

If Seedance 2.0 does not fit the brief — the access is blocked, the subject falls foul of moderation, you need something free to try — the specialists are still real. The questions below help you pick one.

Audio is the first specialist question. Seedance 2.0 and Veo 3.1 both generate audio and video together in a single pass. Seedance 2.0 leads on phoneme-level lipsync precision; Veo 3.1 leads on broadcast-ready sound design and has the most generous free tier. Kling 3.0 is a strong third with native audio in its generations.

Length is the second. If you need a single long take, Luma Ray3.14 (up to 18s) or Kling 3.0 (15s) give you the most headroom per generation. Seedance 2.0 is 15s but can split that into multiple shots in-generation. Veo’s 8s cap is fine for scene-work with extension, less fine if you want one continuous shot. Hailuo 2.3 is deliberately short.

Motion and physics is the third. A person running, a body falling, cloth catching the wind — the things early video models got wrong most often. Seedance 2.0 leads every current public motion benchmark. Kling 3.0 is the strong generalist runner-up. Veo holds its own. Luma is worth trying for flowing, atmospheric motion.

Control is the fourth. If your workflow is image-to-video, keyframes, or video-to-video — Runway Gen-4.5 has the most mature tools. If you want multi-shot consistency without hand-holding — Seedance 2.0 or Kling 3.0.

Consistency across shots is the fifth. Generating one beautiful clip is easy. Generating six that feel like they belong to the same scene is the hardest thing any of these models try to do. Seedance 2.0 and Kling 3.0 are the current leaders. Veo 3.1 is improving. No model solves this perfectly — for a long-form project you will still select, regenerate, and sometimes composite across models.

Access is the sixth. If you want a single subscription that covers every model on this list, Flora Fauna is the cleanest option. It carries Seedance 2.0, Kling 3.0, Veo 3.1, Runway Gen-4.5, Hailuo 2.3, Luma Ray 2, and the legacy Sora 2 Pro entry (which will stop working after April 26) — all from one node-based canvas. For Seedance 2.0 specifically, it is the only practical route for Western creators right now.

The practical approach: Most working video creators do not use a single model. They use Seedance 2.0 as the spine and reach for a specialist when the brief demands it — Veo for a dialogue scene the Seedance moderation blocks, Hailuo for the six-second reaction close-up, Kling 3.0 for the wide establishing shot where 60fps 4K earns its keep. There is a clear king now, but the field around it still has genuine use.

What the benchmarks say

The most-cited public video benchmarks right now are Artificial Analysis’s Video Arena and Tsinghua’s VBench. They measure different things and do not always agree — except on this: Seedance 2.0 is the current #1. On the Artificial Analysis Video Arena it holds 1,269 Elo for text-to-video and 1,351 Elo for image-to-video, ahead of Kling 3.0, Veo 3, and Runway Gen-4.5. VBench broadly agrees. The lead is consistent enough across different evaluation methods that it is fair to call — Seedance 2.0 is the model to beat.

As with any benchmark, treat the ranking as a starting point, not a verdict. The right model for your brief is the one that produces the shot you can actually use, which sometimes means the second-best model on paper and the best one for your specific scene.

What about open source?

The open-source video ecosystem is real and growing. Alibaba’s Wan 2.2 is the most interesting project for most creators — released July 2025 under a clean Apache 2.0 licence, which means you can use it commercially without a lawyer. Weights are on Hugging Face. A single RTX 4090 with 24GB of VRAM is enough for the smaller TI2V-5B model.

Tencent’s HunyuanVideo 1.5 (released November 2025) is technically impressive, but the licence is restrictive: it excludes the UK, EU, and South Korea entirely. If you are reading this from any of those places, Hunyuan is not a legal option for commercial work, even though the weights are public.

The trade-off with open source is the same as it always is: more control, more setup, and a quality gap against the leading proprietary models that is real but narrowing. If you are running your own pipeline on serious hardware and you are outside the excluded territories, Wan is the one to start with. If you are not, the seven models above will cover most briefs.

Sora Is Shutting Down. Here’s Where to Go Next. — the urgent migration guide
Veo 3.1 Video Generation: From Prompt to Timeline — the deep guide to Google’s audio-first model
Ship a 60-Second Film: The AI Video Production Stack — the end-to-end workflow for turning these models into an actual film
AI Image Models in 2026 — the image side of the same landscape
The AI Creative’s Toolbox — the platforms and tools working creatives actually use

Art & Algorithms publishes guides, tutorials, and prompt packs at the intersection of art and code. Subscribe for the full archive.