Computer Use Agent
Definition
An AI agent that interacts with a computer's graphical user interface — taking screenshots, moving the mouse, clicking buttons, and typing text — to accomplish tasks that would normally require a human operating the desktop. Computer use agents work at the pixel level rather than the DOM level (unlike browser agents), making them capable of operating any application: desktop software, web apps, system settings, and custom interfaces. They represent the most general form of digital automation.
Builder Context
Computer use agents are the most general but least efficient form of automation — only use them when there's no API or browser automation path available. They're best for: legacy desktop applications, applications without APIs, and tasks that require visual judgment (chart reading, UI testing, design review). The main challenges: coordinate systems vary across screen resolutions, timing is unpredictable (wait for UI transitions), and visual understanding can misinterpret complex layouts. Always run computer use agents in sandboxed environments to limit blast radius.