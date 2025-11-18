As expected for a new frontier AI model, Google posted high scores for Gemini 3 Pro in various benchmarks. In fact, Gemini 3 Pro comes out on top in most tests, with only a few exceptions. For example, it ties Claude Sonnet 4.5 in AIME 2025 (math, with code execution test), though it outperforms Anthropic's model in the math test alone. Claude 4.5 also beats Gemini 3 Pro in SWE-Bench Verified (agentic coding), but only by a small margin. ChatGPT also does better here than Google's AI model.

Impressively, Gemini 3 Pro crushes rivals in some of these benchmarks, including its predecessor. Gemini 3 Pro scores significantly higher than competitors in Humanity's Last Exam (academic reasoning), ARC-AGI-2 (visual reasoning puzzles), MathArena Apex (challenging math contest problems), ScreenSpot-Pro (screen understanding), CharXiv Reasoning (information synthesis from complex charts), OmniDocBench 1.5 (OCR), LiveCodeBench Pro (competitive coding problems), Vending-Bench 2 (long-horizon agentic tasks), SimpleQA Verified (parametric knowledge), and MRCR v2 (long context performance).

That said, the leaked Gemini 3 Pro benchmarks can't be considered official until Google launches the new model and releases the finalized model card, which can include updated benchmark scores.