Gemini 3 Mobile' printed in clean black sans-serif font on a pure white background

Is Gemini 3 Secretly Live? Canvas Mode Discrepancies Fuel Speculation

Multiple users have reported a clear difference in outputs between Gemini’s Canvas mode on the web and the mobile app. The mobile Canvas output for SVG and other visuals often looks different and sometimes noticeably better. That discrepancy is the single fact driving the current rumor: either Google is quietly routing mobile Canvas to a newer model, possibly Gemini 3, or it has pushed a major checkpoint of Gemini 2.5 Pro into the mobile experience.

Canvas mode has become a testing ground for multimodal work. It can generate and edit code, create interactive visualizations, and produce vector graphics like SVGs. Recent updates have brought Canvas to Android and iOS with a redesigned Create menu and expanded app generation capabilities. Because Canvas is a feature where users create novel content from prompts, it provides clean and repeatable signals for model evaluation.

What the community is seeing is a reproducible difference. The same prompt fed to Canvas on desktop and on mobile yields different SVGs. In many cases the mobile SVG looks more refined. That pattern suggests two possibilities. One, Google is A/B testing a newer model in mobile Canvas to collect real-world signals before a wider rollout. Two, the mobile experience is attached to a new checkpoint of Gemini 2.5 Pro that happens to improve certain visual tasks. Both scenarios are plausible and both explain why outputs differ.

There are important details worth noting. Mobile Canvas is a more prompt-driven workflow and lacks some desktop features like split-pane editing. That means prompts and UI patterns on mobile produce a different interaction shape. It also means mobile routes provide a relatively constrained environment for rolling out experimental checkpoints. Reports of inconsistencies matter: mobile Canvas sometimes outperforms web, and sometimes it underperforms on complex prompts such as ‘cyberpunk robot.’ Those mixed results point to an experimental build, not a finished release.

Why would Google do this? Running a newer model behind a named feature in mobile is an efficient way to gather live usage data. Companies commonly route a subset of traffic to experimental models to observe cost, latency, and failure modes at scale while keeping the larger product stable. That fits the pattern of past pre-release A/B tests and checkpoint drops we’ve seen in the industry and in prior Gemini testing.

Speculative chart showing higher reported visual quality on Mobile Canvas compared to Web Canvas

Exact data measured by my perceived vibes.

The model-version question remains open. There is no official confirmation that Gemini 3 is live. Public release notes emphasize iterative improvements to Gemini 2.5 Pro and experimental model checkpoints. Practically speaking, a checkpoint that behaves substantially better on key tasks functions like a new model even if it is not branded as one. The community will keep testing and sharing side-by-side examples until Google clarifies what is behind the change.

What should you take away from this? First, differences in product output are meaningful signals. When a named feature delivers systematically different results across platforms it usually means the underlying inference stack, routing rules, or model checkpoint differ. Second, inconsistent performance is evidence of an experimental deployment rather than a polished release. Third, this approach gives Google live feedback on where the model helps and where it still fails.

If you want deeper context on how companies run pre-release testing and what that looks like for model rollouts, review the examples from prior pre-release A/B testing for Gemini checkpoints. For a broader view of how new flagship models compete in this space, the analysis of the model race gives useful perspective.

My read is straightforward. This is a controlled public test. The community is doing unpaid QA. The output differences are worth watching because they show where progress is happening. The name that Google ultimately gives the build matters less than the fact that a visibly different capability is being exercised on real users. Expect Google to continue iterating and to be careful about labeling until it has larger scale data and quieter failure modes resolved.