In one year multistep reasoning models went fr
In one year, multi-step “reasoning” models went from rounding error to more than half of all tokens. So the average production workload is no longer “chat” with a side of reasoning. It is reasoning, with some chat wrapped around it so humans don’t run away. If you are a founder, this quietly flips the frontier. The hard problems now look like: - Making long chains of thought reliable instead of occasionally deranged - Controlling behavior over 50–500 step traces, not just prompt + 2 turns - Getting CoT cost under control so your margin is not “we pray Anthropic drops prices” Latency and fluency still matter, but they are starting to look like solved or solvable problems. The new bottleneck is “can I trust this model to think for 30 seconds without wandering off a cliff, and can I afford it if it works?” OpenRouter’s 100T-token study is great to read more about this.