Inference thinking versus real engineering / 2026 / Notes / Anand Chowdhary

Gemini 3 Deep Think V2 hitting 84.6% on ARC‑AGI‑2 is a flex on what happens when you spend 10x to 100x more compute at inference on structured search over hypotheses and constraints. They are throwing a lot of thinking time at the problem. Can this kind of heavy orchestration at inference actually power real engineering workflows where: - The specs are incomplete - The tools are messy - The constraints change mid‑flight - And latency and cost actually matter? Or are we just building increasingly baroque systems that ace synthetic IQ tests while still hallucinating over your codebase and Linear backlog?