OpenAI may be close to releasing an AI tool that can take control of your computer and perform actions on your behalf.
Tibor Blaho, a software engineer with a reputation for accurately leaking future AI products, claims to have discovered evidence of OpenAI’s long-rumored Operator tool. Publications including Bloomberg have previously reported on Operator, which is said to be an “agent” system capable of autonomously handling tasks such as writing code and booking travel.
According to The Information, OpenAI is targeting January as the release month of Operator. The code revealed by Blaho this weekend adds credence to that report.
OpenAI’s ChatGPT client for macOS has gained options, hidden for now, to define shortcuts for the Toggle Operator and Force Quit Operator, according to Blaho. And OpenAI has added references to the Operator on its website, Blaho said — though references that aren’t yet publicly visible.
Confirmed – ChatGPT macOS desktop app has hidden options to define desktop launcher shortcuts to “Toggle Operator” and “Force Quit Operator” https://t.co/rSFobi4iPN pic.twitter.com/j19YSlexAS
— Tibor Blaho (@btibor91) January 19, 2025
According to Blaho, the OpenAI site also contains not-yet-public charts that compare Operator’s performance to other computer-based AI systems. Tables can be placeholders. But if the numbers are accurate, they suggest that the Operator is not 100% reliable, depending on the task.
The OpenAI website already has references to Operator/OpenAI CUA (Computer Usage Agent) – “Operator System Card Table”, “Operator Request Evaluation Table” and “Operator Rejection Rate Table”
Including comparison with computer usage of Sonnet Claude 3.5, Google Mariner, etc.
(table preview… pic.twitter.com/OOBgC3ddkU
— Tibor Blaho (@btibor91) January 20, 2025
At OSWorld, a benchmark that tries to mimic a real computing environment, the OpenAI Computer Use Agent (CUA)—perhaps the AI model that powers the Operator—scores 38.1%, ahead of Anthropic’s computer control model, but well behind slightly less than 72.4% of people. point. OpenAI CUA outperforms human performance in WebVoyager, which assesses an AI’s ability to navigate and interact with web pages. But the model falls short of human-level results on another web-based benchmark, WebArena, according to the benchmarks revealed.
The operator also struggles with tasks that a human can easily perform, if the leak is to be believed. In a test that tasked Operator with signing up with a cloud provider and launching a virtual machine, Operator was successful only 60% of the time. Tasked with creating a Bitcoin wallet, the Operator succeeded only 10% of the time.
OpenAI’s sudden entry into the AI agent space comes as rivals including the aforementioned Anthropic, Google and others make plays for the nascent segment. AI agents may be risky and speculative, but tech giants are already touting them as the next big thing in AI. According to analytics firm Markets and Markets, the market for AI agents could be $47.1 billion by 2030.
Agents today are quite primitive. But some experts have raised concerns about their safety if the technology improves rapidly.
One of the leaked charts shows Operator performing well in selected security assessments, including tests that attempt to force the system to perform “illegal activities” and request “sensitive personal data.” Security testing is said to be among the reasons for Operator’s long development cycle. In a recent X post, OpenAI co-founder Wojciech Zaremba criticized Anthropic for releasing an agent that he claims lacks security mitigations.
“I can only imagine the backlash if OpenAI made a similar release,” Zaremba wrote.
It’s worth noting that OpenAI has been criticized by AI researchers, including former staff, for allegedly de-emphasizing security work in favor of rapidly producing its technology.