When OpenAI unveiled its o3 “reasoning” AI mannequin in December, the corporate partnered with the creators of ARC-AGI, a benchmark designed to check extremely succesful AI, to showcase o3’s capabilities. Months later, the outcomes have been revised, they usually now look barely much less spectacular than they did initially.
Final week, the Arc Prize Basis, which maintains and administers ARC-AGI, up to date its approximate computing prices for o3. The group initially estimated that the best-performing configuration of o3 it examined, o3 excessive, value round $3,000 to resolve a single ARC-AGI drawback. Now, the Arc Prize Basis thinks that the price is far greater — possibly around $30,000 per task.
The revision is notable as a result of it illustrates simply how costly right this moment’s most subtle AI fashions could find yourself being for sure duties, a minimum of early on. OpenAI has but to cost o3 — or launch it, even. However the Arc Prize Basis believes OpenAI’s o1-pro mannequin pricing is an inexpensive proxy.
For context, o1-pro is OpenAI’s most costly mannequin thus far.
“We imagine o1-pro is a more in-depth comparability of true o3 value […] attributable to quantity of test-time compute used,” Mike Knoop, one of many co-founders of The Arc Prize Basis, informed TechCrunch. “However that is nonetheless a proxy, and we’ve stored o3 labeled as preview on our leaderboard to mirror the uncertainty till official pricing is introduced.”
A excessive worth for o3 excessive wouldn’t be out of the query, given the quantity of computing sources the mannequin reportedly makes use of. In keeping with the Arc Prize Basis, o3 excessive used 172x extra computing than o3 low, the lowest-computing configuration of o3, to sort out ARC-AGI.
Furthermore, rumors have been flying for fairly a while about dear plans OpenAI is contemplating introducing for enterprise clients. In early March, The Info reported that the corporate could also be planning to cost as much as $20,000 per 30 days for specialised AI “brokers,” like a software program developer agent.
Some may argue that even OpenAI’s priciest fashions will value nicely beneath what a typical human contractor or staffer would command. However as AI researcher Toby Ord pointed out in a post on X, the fashions might not be as environment friendly. For instance, o3 excessive wanted 1,024 makes an attempt at every activity in ARC-AGI to realize its greatest rating.