HappyHorse 1.0: The Mystery Model That Just Topped Every AI Video Leaderboard

No press release. No company name. No announcement. On April 7, 2026, a model called "HappyHorse-1.0" quietly appeared at the top of the Artificial Analysis video leaderboard — and nobody could fully explain where it came from.

Published: April 9, 2026 · Category: AI Video · Read time: 6 min

The Numbers First

Before the story, the scoreboard:

Metric	Score
T2V Elo (no audio)	1333 — #1 globally
I2V Elo (no audio)	1392 — #1 globally
Parameters	15B
Architecture	Unified 40-layer Transformer
Generation speed	~38s for 1080p on H100
Lip-sync languages	7
Open source	Announced (weights coming soon)

What Is HappyHorse 1.0?

HappyHorse 1.0 is an AI video generation model that handles both text-to-video (T2V) and image-to-video (I2V) — and uniquely, it generates video and audio together in a single inference pass. Not video first, then audio dubbed on top. One unified Transformer doing both at once.

It entered the Artificial Analysis Video Arena anonymously. The platform runs blind user votes: two videos appear side by side, neither is labeled, and users pick the better one. Those votes accumulate into an Elo rating — the same math used in chess rankings. HappyHorse 1.0 climbed to #1 in both T2V and I2V (no audio) through nothing but genuine user preference.

Artificial Analysis themselves used the word pseudonymous when announcing the model. No team name. No organization. Just a model that kept winning.

Who Built It?

The mystery didn't last long. The official site at happyhorse-ai.com has since confirmed: HappyHorse 1.0 was built by the Future Life Lab team inside Alibaba's Taotian Group, led by Zhang Di — former VP of Kuaishou and head of Kling AI technology, who joined Alibaba at the end of 2025 to lead multimodal AI innovation.

This is the same playbook that produced Pony Alpha earlier this year — a mystery model that appeared on OpenRouter, sparked weeks of speculation, and turned out to be Z.ai's GLM-5 running a stealth stress test before launch. Anonymous arena entry, real quality validation, then identity reveal.

The difference here: HappyHorse 1.0 didn't just match the incumbents. It beat them.

Architecture: Why It's Actually Different

Most AI video models either skip audio entirely (Kling, Runway) or generate it as a separate stage after the video is done (Veo 3). HappyHorse 1.0 takes a different approach: text, image, video frames, and audio tokens all share the same 40-layer Transformer sequence.

Technical highlights:

Unified 40-layer self-attention Transformer (~15B parameters)
DMD-2 distillation — only 8 denoising steps required, dramatically faster than standard diffusion
Native joint audio-video generation — output is synchronized by construction, not by post-processing
7-language lip-sync — English, Mandarin, Cantonese, Japanese, Korean, German, French
Native 1080p output with integrated super-resolution module
MagiCompiler accelerated inference — ~2s for 256p, ~38s for 1080p on H100

The joint generation architecture is the meaningful differentiator. When audio is baked into the same forward pass as video, you get synchronized dialogue, ambient sound, and Foley effects without a separate pipeline step.

HappyHorse 1.0 vs. Seedance 2.0

Seedance 2.0 has been the benchmark since its launch. Here's where things stand now:

Category	HappyHorse 1.0	Seedance 2.0
T2V Elo (no audio)	1333 ↑ +60	~1273
I2V Elo (no audio)	1392 ↑ +37	~1355
T2V Elo (with audio)	~1289	~1303 ↑ +14
I2V Elo (with audio)	~tied	~tied
Open weights	✅ Announced	❌ Closed API
Self-hostable	✅ Yes	❌ No
Joint audio-video	✅ Single pass	Separate stage

The takeaway is clean: on pure visual quality, HappyHorse 1.0 leads clearly. On audio-integrated evaluation, Seedance 2.0 holds a narrow edge in T2V and is essentially tied in I2V. Given that HappyHorse is about to release open weights that the community can fine-tune, that audio gap is likely temporary.

What Open Source Actually Means Here

Sora, Veo, Kling, and Seedance are all closed API services. You pay per minute, you cannot inspect the model, you cannot self-host, and you cannot fine-tune for a specific style or character.

HappyHorse 1.0 is being released with the full stack: base model weights, distilled model, super-resolution module, and inference code — under a license that permits commercial use and fine-tuning.

For creators and developers, this means:

Download once, run forever on your own infrastructure
Fine-tune on specific styles, characters, or visual aesthetics
Integrate into your own product without dependency on a third-party API
Inspect the architecture for safety evaluation or research

⚠️ Status check: As of April 9, 2026, both the GitHub and HuggingFace links on the official site still return "coming soon." The weights are announced but not yet downloadable. Watch the repo at github.com or happyhorse-ai.com for the actual release.

What This Means for the AI Video Landscape

HappyHorse 1.0 landing at #1 via blind user voting — not marketing — is a data point worth taking seriously. A few things it signals:

The closed-source quality moat is narrowing. When the top-ranked model on the leading evaluation platform is about to be open-sourced, the argument for paying per-minute API fees gets harder to make.

Joint audio-video generation is becoming table stakes. Models that still produce silent video and require separate audio pipelines are now one generation behind the frontier.

Multilingual lip-sync matters globally. Native support for Mandarin, Cantonese, Japanese, and Korean isn't a footnote — it's a direct signal about who this model was built for and who will benefit most from the open release.

The "stealth drop" strategy works. Both HappyHorse and the earlier Pony Alpha/GLM-5 case demonstrate that anonymous arena testing before official launch is an increasingly effective way to generate credible, independently-verified quality claims.

How to Try It Now

You don't need a GPU or an account to test HappyHorse 1.0 today:

Artificial Analysis Video Arena — blind comparison against other top models, no signup required
happyhorse-ai.com — official demo, registration gives early access to the generation tool
Dzine.ai — HappyHorse 1.0 is already integrated; enter a prompt and generate directly

The Bottom Line

HappyHorse 1.0 earned its #1 ranking through blind user votes, not a PR campaign. It brings a genuinely different architecture — joint audio-video generation in a single Transformer pass — and it's backed by a team with a real track record (the people who built Kling). The open-source release, when it lands, will be one of the most significant drops in the AI video space this year.

The weights aren't out yet. When they are, things get interesting fast.

Tags: #AIVideo #HappyHorse #OpenSource #TextToVideo #Seedance #ArtificialAnalysis #Alibaba #2026