Agentifact assessment — independently scored, not sponsored. Last verified Mar 6, 2026.
OpenAI Operator (CUA)
OpenAI's Operator is an AI agent powered by the Computer-Using Agent (CUA) model, combining GPT-4o vision with reinforcement-learning reasoning to autonomously navigate browsers and complete web tasks like form filling, shopping, and research. It achieved 87% on WebVoyager and 58% on WebArena. For end users, Operator is available in ChatGPT Pro ($200/mo) and is now integrated as a core ChatGPT agent. The CUA model is also available as a research preview in the Responses API for developers in usage tiers 3–5 at $3/M input and $12/M output tokens.
Viable option — review the tradeoffs
You need to automate repetitive web tasks (form filling, data entry, multi-step workflows) without building custom integrations for each target website.
Strong performance on straightforward web tasks (87% on WebVoyager benchmark). However, expect failures on adversarial or heavily obfuscated websites. CUA will ask for user confirmation before submitting orders, sending emails, or other irreversible actions. Watch mode required for sensitive sites (email, banking). Latency is higher than direct API calls due to screenshot processing and reasoning loops.
You're building an agent that needs to handle web tasks but don't want to maintain separate integrations for hundreds of different websites.
CUA achieves 38.1% on OSWorld (full computer use) and 58.1% on WebArena (web-only). Performance degrades on sites with unusual layouts, dynamic content, or CAPTCHA. Self-correction works well for navigation errors but not for logical misunderstandings of task intent. Expect 2–5 second latency per action due to vision processing and reasoning.
Adversarial website attacks and prompt injection vulnerabilities
CUA can be tricked by malicious website content (prompt injections, phishing, jailbreaks). OpenAI's defenses caught all but one case in internal red-teaming. A monitoring model can pause execution on suspicious content, but this is reactive, not foolproof. Attackers can use CUA itself to automate identity attacks at scale.
Refuses high-risk tasks and sensitive workflows
CUA declines banking transactions, sensitive decision-making, and tasks on blocklisted sites (gambling, adult content, drug/gun retailers). Watch mode is mandatory for email and other sensitive platforms, requiring active user supervision. This severely limits autonomous operation in financial or compliance-heavy domains.
User confirmation delays block fully autonomous workflows
CUA asks for explicit user confirmation before submitting orders, sending emails, or finalizing external actions. This breaks true end-to-end automation—you still need a human in the loop for any task with side effects. Plan for confirmation latency in your workflow design.
Trust Breakdown
What It Actually Does
Operator is an AI agent that can see and interact with websites on your behalf, handling tasks like filling forms, shopping, or research without manual steps. It's built into ChatGPT Pro and learns from feedback to improve at complex web tasks.
OpenAI's Operator is an AI agent powered by the Computer-Using Agent (CUA) model, combining GPT-4o vision with reinforcement-learning reasoning to autonomously navigate browsers and complete web tasks like form filling, shopping, and research. It achieved 87% on WebVoyager and 58% on WebArena. For end users, Operator is available in ChatGPT Pro ($200/mo) and is now integrated as a core ChatGPT agent.
The CUA model is also available as a research preview in the Responses API for developers in usage tiers 3–5 at $3/M input and $12/M output tokens.
Fit Assessment
Best for
- ✓browser-automation
- ✓form-filling
- ✓web-task-automation
- ✓ecommerce-operations
Not ideal for
- ✗adversarial prompt injection attacks on websites
- ✗model mistakes from unintended actions
- ✗inability to handle banking transactions
- ✗inability to handle sensitive decision-making tasks
Known Failure Modes
- adversarial prompt injection attacks on websites
- model mistakes from unintended actions
- inability to handle banking transactions
- inability to handle sensitive decision-making tasks
Score Breakdown
Protocol Support
Capabilities
Governance
- permission-scoping
- resource-limits
- audit-log
- rate-limiting