Some more thoughts on the attempt to hide model ch
Some more thoughts on the attempt to hide model choice behind a learned router with GPT-5 in ChatGPT, then the quick rollback when power users wanted control. đź§µ Technically, GPT‑5 is a unified system: a fast default model, a deeper “Thinking” model, and a real‑time router trained on live signals like switch events, user prefs, and measured correctness. I like the ambition. I don’t love blind routing. The API ships gpt‑5, gpt‑5‑mini, and gpt‑5‑nano. You also get reasoning_effort=minimal, verbosity controls, and custom tools. ChatGPT’s non‑reasoning variant is tuned separately. On OpenAI’s SWE‑bench Verified, it hits 74.9% vs o3 at 69.1%. For latency, there’s an explicit service_tier=priority to tame tail times. Early practitioner notes point to about 750 ms P50 TTFT with minimal reasoning. Nice if you can get it, but it isn’t an SLA. What went sideways: the auto switcher had a sev at launch that degraded decisions, so GPT‑5 felt way dumber. OpenAI is tuning the decision boundary, showing which model is active, restoring manual 4o selection, doubling Plus limits, and adding an easier “Thinking” trigger. This is the tradeoff. Learned routing smooths the median experience. Expert users want determinism, provenance, and repeatability. I’m firmly in the second camp on weekdays. Zooming out, the plan shifts from “pick a model” to “compose a system,” then sells QoS with priority tiers to control latency variance. Sensible framing for real workloads. Microsoft rolled GPT‑5 into Copilot on day one. OpenAI says API traffic roughly doubled post‑launch. Some recaps cite peaks near 2B tokens per minute. Plausible at scale, still not in official docs. If routing stays opaque, founders will ask for attribution, override hooks, and stable contracts for latency and reasoning budgets. I’ve shipped routers like this at smaller scale. Opaque routing burns trust. Open questions I’m tracking: - What features feed the router and what loss does it optimize? How is regret measured online vs offline? - How robust is routing under multimodal load and adversarial prompts? - Where do we draw the line between auto and manual control: per‑turn override, per‑thread policy, or project‑level SLAs? My take: ship the router, keep the knob, publish the contract. Power users will meet you halfway.