In a bid to extra aggressively compete with rival AI corporations like Google, OpenAI is launching Flex processing, an API possibility that gives decrease AI mannequin utilization costs in alternate for slower response occasions and “occasional useful resource unavailability.”
Flex processing, which is out there in beta for OpenAI’s just lately launched o3 and o4-mini reasoning fashions, is geared toward lower-priority and “non-production” duties reminiscent of mannequin evaluations, information enrichment, and asynchronous workloads, OpenAI says.
It reduces API prices by precisely half. For o3, Flex processing is $5/M enter tokens (~750,000 phrases) and $20/M output tokens versus the usual $10/M enter tokens and $40/M output tokens. For o4-mini, Flex brings the value all the way down to $0.55/M enter tokens and $2.20/M output tokens from $1.10/M enter tokens and $4.40/M output tokens.
The launch of Flex processing comes as the value of frontier AI continues to climb, and as rivals launch cheaper, extra environment friendly budget-oriented fashions. On Thursday, Google rolled out Gemini 2.5 Flash, a reasoning mannequin that matches or bests DeepSeek’s R1 by way of efficiency at a decrease enter token value.
In an email to customers asserting the launch of Flex pricing, OpenAI additionally indicated that builders in tiers 1-3 of its utilization tiers hierarchy should full the newly launched ID verification course of to entry o3. (Tiers are decided by the sum of money spent on OpenAI providers.) O3’s reasoning summaries and streaming API help are additionally gated behind verification.
OpenAI beforehand stated ID verification is meant to cease dangerous actors from violating its utilization insurance policies.