Gemini Omni: What the Pre-I/O Leak Actually Tells Us
· Chris ShermanMay 2 a UI string. May 11 the first generated clips. May 19–20 the announcement. Six days before Google's keynote, here's what's known about Gemini Omni — and what isn't.
A Leak in Two Acts
For a model that hasn't been announced, Gemini Omni has had an unusually well-documented buildup. The trail starts on May 2, 2026, when an X user spotted a UI string buried inside Gemini's video generation tab that read "Start with an idea or try a template. Powered by Omni." TestingCatalog wrote it up the same day. The string sat there for nine days while everyone speculated.
Then on May 11, 2026, the second shoe dropped. Generated clips — clearly produced by something different from the publicly available Veo 3.1 — leaked from at least one Gemini Pro user account. Two of them got the most attention: a spaghetti scene at a seaside restaurant, and a professor working through trigonometric proofs on a chalkboard. Both got picked up by 9to5Google, Android Authority, Chrome Unboxed, and a dozen other outlets within 24 hours.
The next major event is Google I/O 2026 on May 19–20. By the time you read this it's likely six days away. Google has confirmed that Gemini and AI updates are on the agenda. They have not confirmed Omni by name.
This article is the snapshot from May 13 — the middle of the gap. What's real, what's speculation, what the clips imply, and what to actually watch for when the keynote starts. We'll update after I/O.
The Timeline at a Glance
| Date | Event | Source confidence |
|---|---|---|
| May 2, 2026 | UI string "Powered by Omni" discovered inside Gemini's video tab | High — UI screenshot circulated |
| May 2–10, 2026 | Speculation phase. No concrete output, but multiple outlets confirm the string is real | Verified |
| May 11, 2026 | Generated clips leak from a Gemini Pro account — most notably the spaghetti scene and the chalkboard professor | High — multiple outlets independently reported same clips |
| May 11–12 | Expanded UI string surfaces: "Create with Gemini Omni: meet our new video model, remix your videos, edit directly in chat, try templates, and more" | Verified |
| May 19–20, 2026 | Google I/O 2026 keynote — likely official announcement | Scheduled (not yet occurred) |
Two things stand out. First, the leak was in product, not a marketing slip — Google appears to have started rolling Omni out to a small subset of Gemini Pro users before announcement, and the rollout was visible enough to be screenshotted. That's a more credible signal than a press leak. Second, the second UI string ("remix your videos, edit directly in chat, try templates") tells you Google is framing this as a workflow product, not just a model — language like "edit directly in chat" and "remix" is consumer-product framing, not benchmark framing.
What the Two Clips Actually Show
The two leaked clips are the most concrete information available right now. Both were short — under 10 seconds — and were generated from text prompts inside what users described as the Gemini Pro web interface.
Clip 1: The Spaghetti Scene
A diner at a seaside restaurant eating spaghetti, sunset lighting, mediterranean ambient noise. The notable thing isn't the visual fidelity — that's competitive with what Veo 3.1 already does. The notable thing is that the spaghetti behaves like spaghetti. It twists on the fork, falls back with weight, and the fork-to-mouth motion respects continuity. Physics-heavy food scenes have historically been a weak spot for video models — utensils and food deform unnaturally, strands break, gravity stops working partway through. The leaked clip handles this cleanly, which suggests the underlying model has a noticeably better physics prior than the public Veo 3.1.
Clip 2: The Chalkboard Professor
A professor working through trigonometric proofs on a chalkboard. Camera holds on the board as the professor writes. The interesting thing here is the text and formula rendering. AI video models are notoriously bad at coherent text — letters drift between frames, equations become gibberish midway through, and anything that looks like math typically falls apart. The leaked chalkboard clip shows recognizable mathematical notation rendered consistently across frames, with the professor's hand correctly tracking the strokes. This isn't a minor improvement; it's a category that has been broken for two years.
What These Two Clips Together Imply
If the leaked clips are representative — and we should treat that "if" seriously, since Google would naturally seed clips that show their best output — then Omni is targeting two of the hardest known weaknesses in AI video: complex physics and on-screen text rendering. These are the same two issues that the Sora 2 wind-down and the HappyHorse 1.0 launch both highlighted as the next frontier. (For the canonical narrative on those, see our mid-2026 recap.)
The choice of demo content matters. A spaghetti scene and a math lecture aren't aesthetic flexes — they're capability flexes targeted at exactly the things the competition can't reliably do. That tells you what Google is positioning Omni against.
Three Competing Theories on What Omni Actually Is
This is where pre-I/O speculation lives. There are three plausible interpretations of what Omni represents, and they have very different implications for the rest of the market.
Theory 1: A Consumer Rebrand of Veo 3.1
The simplest interpretation: Omni is just a new public name for the existing Veo pipeline inside the consumer Gemini app. The underlying generation stack doesn't change. Google retires the "Veo" brand from the consumer surface, keeps it for the Vertex AI enterprise API, and gives the Gemini chat experience a single unified product name.
Evidence for: Google has a history of renaming things. Bard → Gemini was the most visible example. Consumer branding around "Veo 3.1" has always been awkward — version numbers don't sell to non-technical users. The UI strings ("remix your videos, edit directly in chat") emphasize workflow, not model novelty.
Evidence against: The leaked clips show capability that exceeds public Veo 3.1, particularly in physics and text rendering. A pure rebrand wouldn't produce visibly different output. Unless Google is shipping a quiet Veo 3.2 under the Omni brand, this theory doesn't explain the clips.
Theory 2: A Separate Gemini-Trained Video Model
The middle interpretation: Omni is a new video model trained inside the Gemini line — separate from the DeepMind Veo pipeline — and sits alongside Veo in Google's roadmap rather than replacing it. Consumer Gemini uses Omni; enterprise customers on Vertex AI continue to use Veo. Both evolve in parallel.
Evidence for: Google has historically maintained parallel model lines (Gemini for consumer, separate research lines for enterprise). The capability jump in the leaked clips is consistent with a model that's been trained on a different data mix and architecture than Veo 3.1.
Evidence against: Running two top-tier video model lines is expensive. The Sora 2 wind-down, which we covered in our post-mortem, showed that even OpenAI couldn't sustain a single consumer video model at scale; running two would be a strange strategic choice for Google.
Theory 3: A Unified Omni-Model (Image + Video + Audio in One Forward Pass)
The most ambitious interpretation: Omni is the first member of a new Gemini-trained model family that handles image generation, video generation, and synchronized audio in a single forward pass. This is the architecture that HappyHorse 1.0 pioneered when it took the Arena #1 in April with a 15B-parameter unified audio-video model. Under this theory, Omni replaces both the current Veo pipeline (video) and the Nano Banana Pro stack (image) with a single multimodal generator.
Evidence for: The product name itself — "Omni" — strongly implies multimodal scope. The UI framing ("our new video model, remix your videos, edit directly in chat") suggests a single product surface covering multiple modalities. The competitive pressure from HappyHorse to ship a unified architecture is acute; Google has been losing the Arena top spot since April. (See our HappyHorse 1.0 review for the architecture details.)
Evidence against: Unified omni-models are technically difficult, and Google has been more conservative than ByteDance or Alibaba about shipping novel architectures to consumers. Replacing two production pipelines simultaneously is a high-risk move for a public keynote.
Where the Money Is
Industry observers split roughly 30/30/40 on the three theories. The most likely reading, based on the UI framing and the capability jump in the clips, is some hybrid of Theory 2 and Theory 3: a new Gemini-trained model that handles at least video and audio in a unified way, with Veo remaining alive on Vertex AI for enterprise customers who need stability. We'll know in six days.
Why This Matters Beyond Google
Omni isn't interesting because Google is releasing a new video model. New video models ship every month now. Omni is interesting because of what it would mean if Theory 3 is right.
The AI video industry spent the first four months of 2026 watching the unified omni-model thesis play out. Sora 2 collapsed in 84 days running a separate-pipelines architecture. HappyHorse 1.0 took the Arena #1 in 48 hours running a 15B-parameter unified architecture. Seedance 2.0 ships audio and video together via a dual-branch transformer. The technical center of gravity has been shifting toward unified models for an entire quarter, and the only major Western lab that hadn't responded was Google.
If Omni is a true unified model — Theory 3 — then Google is matching the architecture trend that the Chinese leaders established. That has three downstream effects:
- The Veo brand consolidates or retires. Running a separate-pipeline Veo alongside a unified Omni doesn't make sense for more than 12 months. Enterprise customers on Vertex AI would expect a migration path.
- The Western/Chinese architecture gap closes. The "Chinese models have a structural lead because they pioneered unified architectures" framing weakens once Google ships its own.
- Model-layer differentiation continues to compress. If four of the top six models all use unified audio-video architectures, the model layer commoditizes further and the agent layer becomes the only meaningful differentiation point. This is the central thesis of our mid-2026 recap, and Omni would extend it.
If Omni is just a rebrand (Theory 1), most of this doesn't apply. But the leaked clips make Theory 1 the least likely of the three.
What to Watch For at I/O — A Six-Item Checklist
When the keynote starts on May 19, here's what tells you which theory was right. None of these alone are definitive, but together they form a clear picture.
Signal 1: Does Google still say "Veo" on the keynote stage?
If Veo is conspicuously absent from the consumer-facing Gemini segment, that's evidence Veo is being retired as a consumer brand. If Veo is still mentioned alongside Omni, the two are coexisting (Theory 2). If both are mentioned but Veo is only positioned for enterprise, the migration is starting.
Signal 2: Does Omni generate audio in the same call as video?
A single API call that returns synchronized video + audio is the technical signature of a unified omni-model (Theory 3). Two separate API calls — video first, then a second call for audio synthesis — is the older architecture pattern. The keynote demo will probably show this clearly.
Signal 3: Does Omni also handle image generation?
If Omni is being positioned as the new video model only, that's a narrower scope. If Omni absorbs image generation as well — replacing Nano Banana Pro inside Gemini's chat surface — that's evidence of the broader unified-modality thesis. Watch whether any image generation demos in the keynote credit "Omni" or stay branded as Nano Banana / Imagen.
Signal 4: Is there an API on day one?
Veo 3.1 launched in Vertex AI on day one of its keynote. If Omni ships with public API access and pricing on May 19–20, it's positioned for production use immediately. If it ships consumer-only with API access "later this year," Google is taking the Sora 2 retail-first approach — which we've already seen doesn't work economically at scale.
Signal 5: What's the pricing structure?
The current public top-tier API pricing benchmark is roughly $0.05/second (HappyHorse 1.0) to $0.50/second (Veo 3.1). If Omni's API pricing lands closer to HappyHorse, Google is competing on cost; if it lands closer to Veo 3.1, Google is competing on quality. The choice will tell you which market Google is prioritizing.
Signal 6: How does Project Astra fit in?
Google has been demoing Project Astra — its real-time multimodal assistant — at every I/O since 2024. If Astra suddenly becomes a product on May 19–20 and uses Omni under the hood, that's the broader "omni" thesis: not just a video model but a real-time multimodal AI surface across the entire Gemini experience.
What This Means for Your Workflow
Three practical things to think about while we wait for the keynote.
If you're a creator using Gemini directly
Don't change anything yet. Omni in the consumer Gemini app, if it ships next week, will simply replace or upgrade the existing video generation experience. The "remix your videos, edit directly in chat" framing suggests the same chat-driven workflow you already know, with a smarter model underneath. Wait for the announcement, try the new capabilities, and update your prompts based on what actually changes.
If you're building on Vertex AI
Watch Signal 1 (Veo brand) and Signal 4 (API availability) carefully. If Veo is being retired as a consumer brand but stays on Vertex AI for enterprise, your existing integration is safe. If Omni replaces Veo entirely on Vertex AI, you'll have an API migration ahead. Either way, build your integration through an agent or orchestration layer so the model swap is a configuration change, not a code change.
If you're running a multi-model agent stack
This is the situation we've been advocating in our recent pieces. (See the six shifts and the long-form bottlenecks writeups.) A multi-model agent treats Omni as another generator to route to — alongside Veo, Seedance, HappyHorse, Kling, Luma, and Runway. The agent layer is where the productive question lives: which shot in this 60-second video routes to which model. Omni's announcement adds another option to the routing table; it doesn't change the architecture you're running.
This is exactly why we've kept Genra's stack model-agnostic: the model layer keeps churning, the agent layer is what compounds.
The Bottom Line, Six Days Before I/O
What we know: there's a real model called Omni inside Gemini's video tab, it produces output that's visibly better than public Veo 3.1 on physics and text, and Google is framing it as a chat-based workflow product. What we don't know: whether it's a rebrand, a parallel new model, or a unified omni-modality system.
The single most useful prediction is the third one. If Theory 3 is right, the Western/Chinese architecture gap closes on May 19, and the industry returns to a multipolar race where all major labs are running unified audio-video architectures. If Theory 3 is wrong, Google is still trailing the architectural frontier set by HappyHorse — and the competitive picture stays as it was after the April HappyHorse launch.
Either way, the practical takeaway is the same: the model layer keeps moving, the agent layer is where you should be building. Omni doesn't change that. It either reinforces it (by adding another commodity model to the routing table) or doesn't move the needle (if it's a rebrand). The teams that have already moved their differentiation to agent infrastructure will absorb whatever Google announces on the 19th as a configuration update. The teams still betting on a single hero model will spend the rest of Q2 retrofitting.
We'll update this piece after the keynote with what's actually announced.
FAQ
What is Gemini Omni?
Gemini Omni is an unannounced AI video generation model that surfaced via two leaks inside Google's Gemini interface — a UI string spotted on May 2, 2026, and generated video clips that leaked from a Gemini Pro account on May 11. Google has not officially confirmed Omni as of May 13. The most likely announcement window is Google I/O 2026 on May 19–20.
Is Gemini Omni replacing Veo?
Unconfirmed. Three theories are in play: Omni is a consumer rebrand of Veo 3.1, Omni is a separate new Gemini-trained model that coexists with Veo, or Omni is a unified omni-modality model replacing both Veo and Google's image generation stack. The leaked clips suggest capability beyond current public Veo 3.1, which makes the pure-rebrand theory least likely.
What did the leaked clips show?
Two clips got the most attention: a spaghetti scene at a seaside restaurant (notable for handling physics-heavy food motion that current models typically break), and a professor working through trigonometric proofs on a chalkboard (notable for rendering coherent mathematical notation across frames, which AI video models have historically failed at). Both capability areas — complex physics and on-screen text — have been industry-recognized weak points for video models.
When will we know what Omni actually is?
Google I/O 2026 on May 19–20. The keynote will most likely confirm or deny the Omni branding, clarify whether it replaces Veo, and reveal whether it handles audio and image generation in addition to video. Watch six specific signals on the keynote stage: whether Veo is still mentioned, whether audio is generated in the same call as video, whether image generation is included, whether an API ships on day one, what the pricing is, and how Project Astra fits in.
What should I do as a creator before the announcement?
Don't change anything yet. If you're using consumer Gemini, wait for the launch and try the new capabilities. If you're on Vertex AI, watch for an API migration path. If you're running a multi-model agent stack, treat Omni as another generator to route to — it doesn't change the architecture you're running.
How does Omni compare to HappyHorse 1.0?
HappyHorse 1.0 took the Artificial Analysis Video Arena #1 in 48 hours when it launched on April 7, 2026, with a 15B-parameter unified audio-video architecture. If Omni is a unified omni-model (Theory 3), it represents Google's first response to that architectural direction. If Omni is a separate-pipeline model (Theory 2) or a rebrand (Theory 1), Google would still be trailing the unified architecture frontier set by HappyHorse.
About the Author
Chris Sherman covers AI video technology and creative production workflows. Follow @GenraAI for live coverage during the Google I/O 2026 keynote on May 19–20.