Agentifact assessment — independently scored, not sponsored. Last verified Mar 6, 2026.
Microsoft UFO
UFO (UI-Focused Agent) is an open-source Windows OS agent from Microsoft Research that uses GPT-Vision to understand and interact with native Windows application GUIs. It employs a dual-agent framework — one agent for high-level planning across apps and one for in-app execution — enabling seamless multi-application task completion. UFO³ Galaxy extends this to multi-device orchestration across heterogeneous platforms. The project is MIT-licensed, completely free, and integrates with any LLM provider via an API key.
Viable option — review the tradeoffs
You need an agent to automate complex tasks across multiple native Windows apps using natural language, without building custom scripts for each UI.
80% success on multi-app tasks across 9 popular apps, fewest steps vs GPT-4, 85% safeguard rate; vision-based so screen clutter can confuse, but interactive mode and custom actions fix most issues.
You want to extend agent capabilities with user demos, custom actions, or RAG for niche Windows apps beyond standard controls.
Highly effective for intricate tasks with customization (highest completion rate in evals); recursive planning shines on long tasks but needs more steps for multi-app (avg 9.8).
Windows-Only
Strictly for Windows OS GUIs; UFO³ Galaxy adds multi-device but core is single-OS focused—no cross-platform native support.
Windows Machine + LLM API
Requires Windows for native UI access (screenshots/controls); any LLM API key for GPT-Vision backbone—no local model option mentioned.
Vision-Dependent Reliability
LLM screenshot analysis fails on cluttered UIs or visual changes; mitigate with control filtering, hybrid detection (UFO²), or custom actions—expect iterative fixes in interactive mode.
Trust Breakdown
What It Actually Does
UFO lets an AI agent see and control your Windows applications just like a human would, understanding button locations and form fields to complete tasks across multiple programs automatically.
UFO (UI-Focused Agent) is an open-source Windows OS agent from Microsoft Research that uses GPT-Vision to understand and interact with native Windows application GUIs. It employs a dual-agent framework — one agent for high-level planning across apps and one for in-app execution — enabling seamless multi-application task completion. UFO³ Galaxy extends this to multi-device orchestration across heterogeneous platforms.
The project is MIT-licensed, completely free, and integrates with any LLM provider via an API key.
Fit Assessment
Best for
- ✓browser-automation
- ✓code-generation
- ✓file-operations
Score Breakdown
Protocol Support
Capabilities
Governance
- user-confirmation