Google I/O 2026 Recap: No Veo 4 — But Gemini Omni and Spark Just Made the Agent Layer Official
· Chris ShermanFor two months, the entire AI video industry talked about Veo 4. It did not ship. What Google announced at I/O 2026 was bigger and stranger: a unified multimodal model called Gemini Omni, a 24/7 cloud-resident agent called Spark, a $100 AI Ultra tier that resets the consumer pricing floor, and a clear signal that Google now sees the agent layer as the next platform fight. Here's the full readout.
Sundar Pichai walked on stage at Shoreline Amphitheatre yesterday and gave the AI video industry something it did not expect. There was no Veo 4. There was no "Veo" branded headline at all. In its place was something more strategically interesting: Gemini Omni, a multimodal model that natively handles text, image, audio, and video generation in a single system; Gemini Spark, a personal AI agent that lives on a cloud VM and acts on your behalf 24 hours a day; and a price restructuring that puts a $100 AI Ultra plan at the center of Google's consumer AI bet.
The keynote rewrote the script for the next 12 months of AI video. Below is everything Google announced, what it actually means, and where the AI video industry now stands on the morning after.
Gemini Omni: The Headline No One Predicted
The most consequential announcement was Gemini Omni — a new model series Google describes as the company's first true unified multimodal generation system. Where Google's previous lineup separated capabilities across Veo (video), Imagen (image), and other systems chained together, Omni handles text, image, audio, and video generation natively in one model.
The first public model in the Omni framework is Omni Flash. It accepts combined text, image, and audio inputs and outputs short cinematic video with synchronized sound. Google demoed users uploading a still image, speaking instructions out loud, and getting back an animated scene with native audio that responds to the spoken direction. The editing is conversational — refine a clip by saying what to change, instead of writing a new prompt and regenerating from scratch.
Three things make Omni strategically different from the Veo lineage:
- One model, not a stack. Veo 3 already had native audio, but the broader Google creative stack still relied on chaining separate models for image generation, audio production, and editing. Omni collapses that chain. The strategic implication is that Google believes the next leap in quality comes from joint training across modalities, not from scaling video-only models further.
- World-grounded generation. Demis Hassabis framed Omni as building on Google DeepMind's world-models work. The pitch is that Omni generates video with stronger spatial, temporal, and physical coherence because the underlying model has a richer internal world representation. Whether the output proves this in practice is a question we'll be benchmarking over the next quarter.
- Editing as a first-class capability. Omni is being positioned not just as a generator but as an editor. Conversational refinement, scene swaps, and remix-style operations are part of the product surface, not an external layer. This is a meaningful shift in product philosophy that competitors will have to respond to.
What Omni currently does not do: long-form. Omni Flash is short-form, and Google was explicit that longer and more advanced production workflows are planned but not yet shipping. Anyone hoping for one-shot 60-second narrative generation is still waiting.
Gemini Spark: A 24/7 Personal Agent in the Cloud
If Omni was the headline most pundits got wrong, Spark was the announcement most underestimated.
Gemini Spark is a personal AI agent that lives on a dedicated Google cloud VM, runs continuously, and acts on your behalf across Google products and an expanding list of third-party services through Model Context Protocol (MCP). The product description, in Google's own framing: an agent that can "book restaurants, put in an Instacart order, and draft your inbox replies while you sleep."
The strategic significance is hard to overstate. For two years, Google's consumer AI story was Gemini as a chatbot. Spark is Google explicitly saying that the chatbot was the wrong frame — the right frame is an autonomous agent that operates across applications and time. The agent reads your inbox, takes actions in your tools, plans across services, and reports back. The user describes outcomes; Spark handles execution.
This is the same thesis the AI video industry has been arguing about for the last year, applied to general productivity. The agent layer is no longer a startup positioning bet. It is now Google's positioning bet.
Pricing matters here. Spark is gated behind the new $100/month AI Ultra tier and rolls out in beta to U.S. subscribers next week. The pricing alone signals that Google believes there's a meaningful population of users willing to pay nine times more than the $11 Gemini Pro tier to get an agent that genuinely does things.
Gemini 3.5: The Foundation Update
Underneath the Omni and Spark announcements sits a foundation model refresh. Gemini 3.5 Flash launched yesterday across the Gemini app, Search, Antigravity, and the Gemini API. Google's claim: it surpasses Gemini 3.1 Pro on coding, agentic, and multimodal benchmarks while running at roughly 4x the output token speed of comparable frontier models.
Gemini 3.5 Pro is announced but not yet generally available. It's in testing and ships next month.
The pattern across Flash, Pro, Omni, and Spark is consistent: every product Google announced at I/O is built on the agentic capabilities track. Faster instruction-following, longer effective context, better tool use, and more reliable multi-step execution. The model layer is being shaped to serve the agent layer above it.
Antigravity 2.0: The Developer Story
Antigravity is Google's agent development platform. Yesterday it received a 2.0 upgrade focused on orchestration — letting developers compose, schedule, and supervise multiple agents that interact with each other and with external tools.
The relevance for AI video is indirect but real. As more AI video tools move from single-model wrappers to actual orchestrated pipelines, the underlying infrastructure for running, monitoring, and debugging those orchestrations becomes a foundational dependency. Antigravity 2.0 is Google trying to own that infrastructure layer the same way it owns the model layer beneath it.
Whether independent agent builders will rely on Google's infrastructure or build their own is one of the more interesting open questions emerging from this keynote. The answer determines how much of the agent economy Google captures versus how much remains genuinely open.
The $100 AI Ultra Tier: A Price-Floor Reset
Google AI Ultra now starts at $100 per month, with a higher tier priced at $200. The previous Ultra plan was $250. The new entry tier includes Gemini Spark beta access, 5x the Gemini app usage limit of the $20 Pro tier, 20TB of cloud storage, and YouTube Premium.
The strategic read is straightforward: Google is pricing premium consumer AI aggressively to capture the early adopters who will define what an agent product feels like. At $100/month, Spark is now in direct competition with the high end of ChatGPT Pro and Claude consumer tiers. The agent feature is the differentiator — and it's a feature competitors will need to ship versions of within the next 12 months or cede the productivity-agent category.
For creators and operators, the relevant question is whether $100/month for a personal agent meaningfully accelerates the work. The honest early answer: it depends entirely on whether Spark's beta lives up to the demo. Demos are demos. We will know in 90 days.
Android XR and Project Aura: The Hardware Surface
Google also unveiled new "intelligent eyewear" devices, including Project Aura, the XR-class smart glasses developed in partnership with Xreal. At least three smart glasses partnerships are launching this year, positioning Google between Meta's audio-first Ray-Bans and full XR headsets.
The AI angle: these are Gemini-powered. Live visual context, voice interaction, and agentic action — all wearable. For AI video, the implications are downstream but real. A wearable camera with Gemini context becomes a permanent input device for video creation, both for reference capture and for live editing on the move. We're 18 months from this mattering for production workflows. We're zero months from it mattering for consumer demos.
Android 17: The OS as Intelligence Layer
Sameer Samat's Android update positioned the OS itself as transforming "from an operating system to an intelligence system." The framing — Gemini understands context across apps, anticipates needs, and takes actions on the user's behalf — is the same agent-layer thesis applied to the mobile platform.
The concrete features matter less than the framing. Google is committing to a future where the OS layer and the agent layer collapse into one stack, all running on Gemini foundation models. For developers, this means agent-aware app design is no longer an optional pattern; it's the baseline assumption Google is building the platform around.
What Didn't Ship: The Veo 4 Absence
The most-watched expected announcement that didn't happen: Veo 4. There was no Veo 4 reveal, no Veo 4 timeline, and no explicit confirmation that Veo is being deprecated in favor of the Omni line.
The most likely read: Google is consolidating its generative video efforts under Omni rather than continuing parallel Veo development. Omni Flash is positioned as the new starting point. Veo 3.1 remains the production-grade option for use cases Omni Flash doesn't yet cover — particularly longer single-shot generation, 4K output, and ID-embedding character consistency, none of which Omni Flash currently supports.
For the broader AI video industry, this is a meaningful pivot. Eighteen months of "what will Veo do next" conversation has been replaced with "what is Omni." Operators with Veo-specific automation will need to evaluate whether to wait for Omni to mature on long-form, or to keep production on Veo 3.1 for the foreseeable future. Probably both, in parallel, on different content types.
What This Means for AI Video Operators
Stepping back from the individual announcements, three things changed yesterday that will shape AI video for the next year.
First, the model strategy got messier in a useful way. Omni is a unified multimodal bet, but Omni Flash is short-form only. Veo 3.1 still does the heavier lifting for longer clips and higher resolutions. Real production pipelines will use both, route between them, and switch dynamically as Omni matures. The agent layer is where that routing logic lives.
Second, agent-layer thinking is now consensus. Spark is Google saying out loud that the chatbot framing was a transition step and the destination is an autonomous agent. Every consumer and enterprise AI product team that has been debating whether to build "an assistant" or "an agent" has been handed a settled answer. The agent layer is where competition moves.
Third, conversational editing changes creator workflows. Omni's emphasis on in-chat editing — refine a clip by describing what to change — collapses what used to be a generate-then-edit two-step. For AI video creators, this is a meaningful UX simplification that competitors will be expected to match. Genra's pipeline already supports conversational iteration; expect every serious AI video platform to ship a version of this within six months.
What Genra Is Doing Next
A few honest notes on where Genra goes from here.
Omni Flash will be integrated as it becomes available through the Gemini API. The agent layer Genra has been building was designed to be model-agnostic precisely so additions like Omni become backend changes, not workflow changes. Users will see better short-form output as the routing logic starts choosing Omni Flash for the shots it does best. Long-form, 4K, and high-consistency use cases continue to run on Veo and Seedance.
Spark's framing as a 24/7 cloud-resident agent is the closest validation we could have asked for of the agent-layer thesis. Genra is a domain-specific agent for video production. Spark is a general-purpose agent for personal productivity. The two coexist comfortably — the same way a CRM agent and a coding agent coexist with a general productivity assistant.
The bigger competitive frame: with Google now committed to the agent layer at the platform level, the question for every AI video startup is no longer "are agents the future" — that's settled. The question is which domain-specific agents become the trusted choice in their category. For AI video, that's the question Genra is built to answer.
Key Takeaways
- Google I/O 2026 did not ship Veo 4. The headline video announcement was Gemini Omni, a unified multimodal model handling text, image, audio, and video generation in a single system, with Omni Flash as the first public model.
- Gemini Spark, a 24/7 cloud-resident personal agent that acts across Google products and MCP-connected third-party services, is the most strategically significant announcement. It commits Google to the agent layer as the next platform fight.
- Gemini 3.5 Flash launched yesterday; Gemini 3.5 Pro is in testing for next month. Every foundation update was framed around agentic capabilities, not just intelligence.
- AI Ultra was repriced to $100/month entry ($200 top tier), down from the previous $250 Ultra. Spark beta access is gated to the $100 tier for U.S. subscribers next week.
- Antigravity 2.0 expands Google's agent development platform with orchestration tooling — the infrastructure play for agent builders.
- Android XR and Project Aura smart glasses, plus Android 17's "intelligence system" framing, extend the agent thesis into hardware and OS layers.
- Omni Flash is short-form only. Veo 3.1 remains the production tool for longer, higher-resolution, ID-consistent video. Real pipelines will route between both.
- Conversational editing as a first-class capability in Omni is a workflow shift competitors will need to match within six months.
- Genra integrates Omni Flash as soon as API access is available, with users seeing the quality lift on routed short-form shots silently. Long-form, 4K, and consistency-critical work continues on Veo and Seedance.
Frequently Asked Questions
Did Google announce Veo 4 at I/O 2026?
No. There was no Veo 4 announcement. Google introduced the Gemini Omni model series instead, with Omni Flash as the first publicly available model. The most likely interpretation is that Google is consolidating generative video work under the Omni framework rather than continuing parallel Veo generations.
What is Gemini Omni?
Gemini Omni is Google's new unified multimodal model series, capable of generating text, image, audio, and video natively from combined inputs. Omni Flash is the first public model, focused on short-form video with synchronized native audio and conversational editing.
What is Gemini Spark?
Gemini Spark is a 24/7 cloud-resident personal AI agent that runs on a dedicated Google VM, integrates with Google products and 30+ third-party services via MCP, and takes actions on the user's behalf — booking, ordering, drafting, and managing tasks. It rolls out in beta to U.S. AI Ultra subscribers next week.
How much does Google AI Ultra cost in 2026?
The new AI Ultra entry tier is $100 per month, down from $250. A higher tier is priced at $200. The $100 plan includes Gemini Spark beta access, 5x the Gemini app usage limit of the Pro tier, 20TB of cloud storage, and YouTube Premium.
What is Gemini 3.5 Flash?
Gemini 3.5 Flash is Google's latest fast-tier foundation model, launched May 19, 2026. Google claims it surpasses Gemini 3.1 Pro on coding, agentic, and multimodal benchmarks while running at roughly 4x the output speed of comparable frontier models. Available in the Gemini app, Search, Antigravity, and the Gemini API.
What is Antigravity 2.0?
Antigravity is Google's agent development platform. The 2.0 release adds orchestration tooling so developers can compose, schedule, and supervise multiple interacting agents. It targets the infrastructure layer beneath agent products.
What did Google announce about smart glasses at I/O 2026?
Google unveiled new Android XR-class "intelligent eyewear" devices, including Project Aura developed with Xreal. At least three smart glasses partnerships are launching in fall 2026, positioning Google between audio-first glasses and full XR headsets. All Gemini-powered.
Will Genra integrate Gemini Omni?
Yes. Genra is built so integrating a new model is a backend change rather than a workflow change. Omni Flash will be added to the agent's routing logic as soon as it becomes available through the Gemini API. Users will see quality improvements on short-form output without changing how they work.
Is Veo 3.1 still available after I/O 2026?
Yes. Veo 3.1 remains available through Google AI Studio and Vertex AI. It continues to be the production-grade option for longer clips, 4K output, and use cases that need character consistency via ID-embedding — capabilities Omni Flash does not yet support.
What does I/O 2026 mean for AI video creators?
Three shifts. First, the model strategy now spans Omni for short-form unified multimodal and Veo 3.1 for long-form and high-res — real pipelines will route between both. Second, agent-layer thinking is now consensus at the platform level, not just a startup positioning bet. Third, conversational editing is becoming a baseline capability that all AI video tools will need to match.
About the Author
Chris Sherman covers AI video technology, agent architectures, and the business of creative production. Follow @GenraAI for continuing coverage of the post-I/O AI video landscape and the MiniMax hearing (May 29).