Google’s Gemini 3.1 Pro Redefines AI Reasoning with 77.1% on ARC-AGI-2

Google has just released Gemini 3.1 Pro, marking a significant leap forward in AI reasoning capabilities. The model scored an impressive 77.1% on the ARC-AGI-2 reasoning benchmark—a massive jump from the 31.1% achieved by its predecessor, Gemini 3 Pro. ## Key Highlights ### Benchmark Performance Gemini 3.1 Pro not only dominates the ARC-AGI-2 reasoning benchmark but also takes the top spot across multiple other benchmarks: - Science - Competitive coding - MCP (Model Context Protocol) use - Agentic search This positions 3.1 Pro as a serious contender against Anthropic’s Claude Opus 4.6 (68.8%) and OpenAI’s GPT-5.2 (52.9%). ### Pricing and Accessibility Despite the substantial performance improvements, Google has kept the API pricing identical to Gemini 3 Pro. The model maintains a 1M token context window while offering more competitive pricing compared to frontier models from Anthropic and OpenAI. ### Integration with Deep Think The 3.1 Pro model serves as the core intelligence behind Google’s recent Deep Think update, now available across: - Gemini app - NotebookLM
- Developer tools ## Why It Matters After letting Anthropic and OpenAI dominate AI headlines in early 2026, Google has made a commanding comeback. The combination of Deep Think and Gemini 3.1 Pro signals Google’s return to the “world’s top model” conversation. Industry analysts expect OpenAI to respond with a counter-announcement sooner rather than later—potentially with GPT-5.3 or a major reasoning-focused update. — This development represents a pivotal moment in the AI race, with reasoning capabilities becoming increasingly crucial for advanced AI applications.