Google I/O 2026 Leaks: Gemini 3.5, Omni, and What to Expect

“Google is a leaky bucket.” Just like their Pixel phones always leak, there have been tons of leaks leading up to Google I/O about what we are going to get there. Some of them were intentional by the team, others less so. The pattern fits everything from UI strings appearing in production then vanishing to Play Store listings that give away the roadmap. The Omni leak fits this exactly. A string about templates powered by Omni showed up in the live Gemini video generation tab and then disappeared. That is the oops signal we have seen before.

What is already clear is that we are going to get Gemini 3.5. It is going to be pretty incredible. In fact Gemini 3.5 Flash is already available on VoxelBench so that you can see some of its 3D capabilities and it is far beyond existing models. Their Pro model will be even more impressive. Only the likes of GPT-5.5 Pro can compete with it. That model can spend hours thinking and working to produce a result just to compete with “Google’s Flash model.”

When it comes to Gemini models since Gemini 3 and arguably even 2.5 their one-shotting ability and 3D multimodal stuff have been ahead. This matches the spatial intelligence gains we have tracked in tests like Blueprint-Bench 2 where models started showing real consistency on novel grid puzzles and visual structures. The leaks and breakdowns back the idea that Google is pushing toward a unified multimodal system rather than separate tools. Yet that strength does not necessarily translate to being able to do real work. People’s real work does not consist of making 3D voxel scenes in as much detail as possible. It often consists of messy agentic work where you have to click around a computer or the browser, execute lots of code, and stay on track for hours at a time.

That is the kind of competition they are dealing with. GPT-5.5 has been known to work for multiple days on a single goal without getting off track. So that is what they have to compete with if they are trying to gain the pro users. The research on these Google I/O 2026 leaks shows them moving toward unified agent-like media workflows inside one Gemini surface. The data flywheel advantage still sits with the labs whose users already run lots of long agentic sessions. That gap matters.

We are also getting a new AI Studio mobile app available for pre-registration. This will enable easy mobile vibe coding. They have already changed the logo for AI Studio. The app fits the pattern of making Gemini accessible for quick iterative work on the go. It should lower the barrier for users who want to test ideas without sitting at a desk. I see this as a practical step that could shift how people approach lightweight coding sessions throughout the day. Pair it with the live voice model and you start to see how Google wants Gemini available the moment an idea hits you wherever you are.

A new live voice model is on the way. It should be powered by Gemini 3.5 Flash Lite. They were originally going to call it 3.2 but it looks like they are going with 3.5 instead. Gemini now has multiple tiers. Pro, Flash, and Flash Lite. They have not done an Ultra model since the 1.0 flop. The Flash Lite variant makes sense for low latency live conversations where speed matters more than maximum depth. This tiering gives Google clear lanes for different use cases instead of forcing every task onto the heaviest model. For quick back and forth on your phone this could feel like a real improvement over previous versions.

A new Gemini Omni model is also expected. This can do video generation and conversational video editing like a Nano version of video. We do not know if they will call it Veo 4 Omni or Gemini Omni. The examples are impressive. They show eight seconds of HD video with very good likenesses and consistent voices. The naturalness does not quite match SeeDance 2.0 in my view but the controllability will likely be significantly better and more mainstream. The leak that surfaced in the Gemini video generation tab with text about templates powered by Omni fits the leaky bucket story perfectly. It appeared in a live interface then disappeared which points to active preparation. Early tester notes highlight stronger voice quality, better cinematic transitions, more consistent camera angles, and natural scene composition. One tester called it one of the best video models seen so far. That level of control could make video editing accessible for users who need quick iterations rather than perfect artistic output. If it unifies text, image, video, audio, reasoning, memory and agent-style workflows the way the breakdowns suggest then Omni could reduce a lot of the fragmentation people complain about when jumping between tools.

They are also updating the UI experience for Gemini in the web app where you can select Thinking Effort. They are updating their desktop app for Mac. The Mac app will include live screen sharing, cursor-based contextual prompts, and better live modes in general. They previously announced Google Books at the Android show which are their new laptops with Gemini built in and different cursor controls. This Mac app might just be a way to give some of that cool functionality to Mac users as a desktop experience. The addition of cursor context and screen sharing addresses a clear pain point for users who want the model to understand exactly what they are looking at on their screen. It moves Gemini closer to useful desktop agent behavior even if the underlying model still needs to prove itself on long tasks. For anyone doing real work on a Mac this could be one of the more immediately useful pieces because it lets the model see your actual screen and cursor position instead of guessing from a description.

What I expect from this release is that Google will have these big model releases, everyone will run a bunch of one-shots that go viral on Twitter, and it is going to be crazy. Then about a week later people will realize that its agentic capabilities are not up to par. We cannot know that for sure. Maybe they really cooked with this and it will compete. The issue is that OpenAI and Anthropic have huge user bases where they get tons of agentic data constantly. Almost nobody is using Google in the same way so it is hard for them to compete in this area. The multimodal wins in one-shot settings and the 3D strengths we see on VoxelBench are real but they sit apart from the messy multi-step browser work, code execution, and hour-long focus that define most professional output. Google I/O 2026 rumors keep pointing to these advances yet the data flywheel that powers sustained agent performance still favors the labs with larger active agent usage.

The Omni model could bridge some of that gap by unifying text image video audio reasoning and memory inside one surface. If the conversational video editing delivers on controllability it might become a daily tool for creators who need fast edits rather than Hollywood polish. The new mobile app for AI Studio could make vibe coding feel more fluid when ideas strike away from the desk. The live voice model on Flash Lite should feel snappy for casual use. The Mac desktop upgrades with live screen sharing and cursor prompts give Mac users some of the contextual power announced for the Google Books laptops. Those pieces show Google trying to expand the places where Gemini fits into real workflows. The logo changes are minor in comparison. What matters more is whether the models can hold focus across the kinds of agentic sessions that power actual paid work.

One-shot demos will create the initial wave of excitement and the 3D multimodal examples will spread quickly because they look impressive in isolation. The harder test arrives when users try to chain those capabilities into browser navigation, repeated code runs, and multi-hour projects without constant course corrections. The research on these leaks shows Google aiming for less fragmentation across tools. That direction matches the shift we have seen in other releases where unified interfaces beat piecemeal ones. Still the proof sits in how well Gemini 3.5 Pro handles the long-horizon tasks that GPT-5.5 already manages for days at a time. If the agentic data advantage stays with OpenAI and Anthropic then Google will keep winning on flashy one-offs while trailing on the work that pays the bills. We will see how it plays out once the models drop. For now the leaks give us a clear preview of the features and the open questions that will define whether this round changes model choices for power users or simply adds another strong option for specific multimodal jobs.

People often get excited about the visual and one-shot wins because those create the shareable moments. I get it. A detailed 3D voxel scene or an HD video with consistent voice and face is impressive on first look. The practical question I keep returning to is whether those strengths survive when the task gets messy and long. Most professional work is not a single clean prompt. It is iteration, debugging, switching between browser tabs, running code, checking outputs, and staying oriented for hours. Models that drift lose value fast in that environment. GPT-5.5 has shown it can hold a single goal across days. That is a high bar. Google has the multimodal lead in many one-shot tests. Turning that into reliable agentic performance is the next hill. The new tools around the model like the mobile app, Mac desktop features, and thinking effort selector are useful steps toward better usability. They address real friction points. Yet they cannot fully compensate if the base model still needs frequent nudges to stay on track. The tiering with Flash Lite for voice and speed, Flash for throughput, and Pro for heavy reasoning gives them a sensible structure. It avoids the old mistake of one model trying to be everything. The pre-registration for the AI Studio mobile app suggests they want to capture those quick mobile sessions where vibe coding happens in spare moments. That could grow their usage and eventually their agentic data if it sticks. The Omni piece is the one that has me most curious. If it really delivers conversational video editing with good controllability it might become the mainstream video tool that previous versions never quite reached. The tester feedback on cinematic quality, likeness, and transitions supports that potential. Whether they brand it as Gemini Omni or Veo 4 Omni does not change the capability question. What matters is whether people actually keep it in their workflow after the first week of testing.

The leaks have given us a good look at the roadmap before the official stage. Intentional or not they set expectations. My read is that the one-shot and multimodal side will deliver the viral moments and justify a lot of the hype. The agentic side is where the uncertainty remains. If Google has closed that gap with 3.5 then this could shift some pro user choices. If not we will see the familiar pattern. Big splash, lots of impressive clips, then the realization that for sustained real work the data advantage still tilts the field. Either outcome is useful information. The Mac app with cursor-based prompts and screen sharing could be the sleeper hit for desktop users who have been asking for better context. It gives them some of the laptop functionality on their existing machines. Combined with the live voice and mobile app it paints a picture of Google trying to meet users where they work instead of requiring them to come to the web app. Those experience improvements matter. They make the model more accessible for daily tasks. The core model capability questions still sit at the center. One week after launch we should have clearer answers on whether the agentic performance matches the multimodal promise.

Links

They're clicky!

Follow on X →Ironwood →
Adam Holter
Adam Holter

Founder of Ironwood AI. Writing about AI models, agents, and what's actually happening in the space.