Best bit from simonws Bay Area AI Security talk / 2025 / Notes / Anand Chowdhary

Best bit from @simonw’s Bay Area AI Security talk: don’t think “prompt injection” but think permission composition. Private data + untrusted content + egress equals a data‑theft machine. That combo is the lethal trifecta. 3️⃣👇 I keep seeing the same exfil chain in the wild. Markdown tricks slip past naive filters. Inline links get blocked, reference-style links slide through. Then a broad CSP allowlist feels safe until an open redirect inside that domain ferries data out. Like story Simon shared. Filters help with hygiene. They aren’t isolation. You need an egress proxy with strict redirect rules and query stripping, plus a tiny allowlist you actually audit. Not everything can be automated here, someone needs to keep an eye. Detectors that catch “~99%” feel comforting. In adversarial settings, that last 1% is the incident. The fixes are architectural. Constrain data and control flow so untrusted tokens can’t trigger consequential actions. Think Action-Selector, Plan-Then-Execute, Dual LLM, Code-Then-Execute, and aggressive context minimization. I like CaMeL’s code-then-execute with capabilities and dataflow tracking. They even report provable guarantees on a chunk of benchmark tasks. The trade-off is real: some utility reduction and policy authoring. Worth it. MCP is powerful and a little scary. Mix a few servers and you can assemble the trifecta by accident. The spec asks hosts to get explicit consent and to treat tool descriptors as untrusted. In practice, “Always Allow” erodes that promise fast. We’ve already seen the pattern: attacker-controlled issues, access to private repos, and exfil via creating a PR. Without per-source taint boundaries, capability brokerage, and audited egress, agents stay porous. What I want next is boring on purpose: a reference agent stack that proves once untrusted tokens enter, no new tool choices can be made. Static dataflow proofs. Per-session capability budgets. Default‑deny egress with tight redirects. If you’re shipping agents, aim for “can’t happen,” not “unlikely.” Simon’s talk is a solid starting point: https://simonwillison.net/2025/Aug/9/bay-area-ai/