An organization developing mathematical standards for AI did not disclose that it had received funding from OpenAI until relatively recently, drawing accusations of impropriety from some in the AI community.
Epoch AI, a non-profit organization funded primarily by Open Philanthropy, a research and grant-making foundation, revealed on December 20 that OpenAI had supported the creation of FrontierMath. FrontierMath, a test with expert-level problems designed to measure an AI’s mathematical skills, was one of the benchmarks that OpenAI used to demonstrate its upcoming AI, o3.
In a post on the LessWrong forum, a contractor for Epoch AI with the username “Meemi” says that many contributors to the FrontierMath benchmark were not informed of OpenAI’s involvement until it was made public.
“The communication about this has not been transparent”, wrote Meemi. “In my opinion Epoch AI should have disclosed OpenAI funding and contractors should have transparent information about the potential of their work being used for skills when choosing whether to work on a standard.”
On social media, some users raised concerns that the secrecy could erode FrontierMath’s reputation as an objective benchmark. In addition to supporting FrontierMath, OpenAI had access to many of the problems and solutions in the benchmark—a fact that Epoch AI didn’t discover before December 20, when o3 was announced.
In a response to Meemi’s post, Tamay Besiroglu, associate director of Epoch AI and one of the organization’s co-founders, maintained that FrontierMath’s integrity had not been compromised, but acknowledged that Epoch AI “made a mistake” by not being more transparent.
“We were limited from partnership disclosure until the time o3 was launched, and in hindsight we should have negotiated harder for the ability to be transparent to standards contributors as soon as possible,” Besiroglu wrote. “Our mathematicians deserved to know who could access their work. Although we were contractually limited in what we could say, we should have made transparency with our contributors a non-negotiable part of our agreement with OpenAI.
Besiroglu added that while OpenAI has access to FrontierMath, it has a “verbal agreement” with Epoch AI not to use FrontierMath’s problem set to train its AI. (Training an AI on FrontierMath would be similar to teaching to the test.) Epoch AI also has a “special holding group” that serves as an additional safeguard for independent verification of FrontierMath’s benchmark results, Besiroglu said.
“OpenAI has been fully supportive of our decision to hold a separate, unprecedented group,” Besiroglu wrote.
However, muddying the waters, Epoch AI’s lead mathematician Ellot Glazer noted in a Reddit post that Epoch AI has not been able to independently verify OpenAI’s FrontierMath o3 results.
“My personal opinion is that (OpenAI’s) output is legitimate (ie, they weren’t trained on the database) and that they have no incentive to lie about internal benchmark performances,” Glazer said. “However, we cannot vouch for them until our independent assessment is complete.”
The saga is another example of the challenge of developing empirical standards to evaluate AI—and securing the resources needed to develop standards without creating the perception of a conflict of interest.