Cyberdesk launch signals new era
This week YC S25’s Cyberdesk launched on HN and I realized how far we’ve come from LLMs clicking UIs - trajectory caching and deterministic replay with a tight break glass to computer use. That feels right. The threshold policy is basically the product. 🖱️👇
Under the hood, here’s how I read it.
An open driver runs on your Windows box and exposes a local HTTP API for screenshots, keyboard, and mouse. It keeps a reverse tunnel to their cloud. You teach tasks in plain English. They store screenshot plus action pairs, then replay them.
When screens don’t match, it calls Claude Computer Use to recover and compute coordinates. The driver is Python with FastAPI, pyautogui, and mss. They keep the viewport low resolution to improve click accuracy and cost.
My only caveat is how long do you stay deterministic before you bail out? Screen diff rules, DPI and scaling, localization, and theme changes all nudge coordinates. Misses push you into the model and costs jump. This is the reliability story.
Context I’m seeing: incumbents like UiPath already blend vision with deterministic steps and enterprise plumbing. On the other side, projects like UI‑TARS and OmniParser show credible local routes.
A nice next step here is fusing vision with Windows UI Automation for semantic anchors, so buttons survive UI drift. Compliance looks thoughtful, but zero data retention is still a contract you trust, not a guarantee you can verify.