OpenAI has launched a new, cheaper way to use its AI models, according to TechCrunch. The company announced a “Flex processing” API option that cuts costs by 50% for developers who don’t need quick responses. This new option is currently in beta and works with OpenAI’s o3 and o4-mini reasoning models.
Flex processing is like choosing a slower shipping option to save money. You pay less, but you have to wait longer. With this new option, developers can run AI tasks at half the normal price if they’re willing to accept slower response times. This is especially helpful for tasks that don’t need to happen right away, like background jobs that can run overnight.
The cost savings are significant for developers. OpenAI has cut token prices in half for both models. Tokens are the units AI models use to process text – each word is typically made up of several tokens. With these reductions, developers can run more AI tasks while staying within their budgets.
Model | Token Type | Previous Price (per million tokens) | New Price (per million tokens) |
---|---|---|---|
o3 | Input | $10 | $5 |
o3 | Output | $40 | $20 |
o4-mini | Input | $1.10 | $0.55 |
o4-mini | Output | $4.40 | $2.20 |
This new option is available to developers in tiers 1-3 of OpenAI’s usage system. These tiers include small-to-medium scale users who have spent certain amounts on the API. The company has also introduced ID verification requirements for these tiers to prevent misuse.
- Tier 1: Requires $5 spent on the API (approximately ₹427)
- Tier 2: Requires $50 spent and at least 7 days since first payment (approximately ₹4,269)
- Tier 3: Requires $100 spent and at least 7 days since first payment (approximately ₹8,538.7)
The main trade-off with Flex processing is the slower response time. While standard API calls try to respond quickly, Flex processing has a 10-minute default timeout (which can be extended to 15 minutes). There’s also a chance that your request might fail if resources are unavailable, requiring you to try again later. This is why developers need to build “retry strategies” into their applications when using Flex processing.
OpenAI’s new ID verification requirement aims to prevent bad actors from violating usage policies. Developers in the affected tiers will need to verify their identity to access these models, adding an extra layer of security and accountability.
Not all AI tasks are suitable for Flex processing. Here’s a simple guide to help understand when to use it:
- Good for Flex processing: Data analysis, model evaluations, batch processing jobs, content generation that isn’t time-sensitive
- Not good for Flex processing: Real-time chat applications, customer service bots, or any task where users are waiting for immediate responses
- Requires adjustment: Developers need to add retry logic to handle potential resource unavailability
This move by OpenAI comes as competition heats up in the AI industry. Companies like Google are also offering AI models, and OpenAI is looking for ways to make their services more attractive and affordable. By providing a cheaper option for non-urgent tasks, OpenAI is helping more developers access advanced AI capabilities without breaking their budgets. This could lead to more AI-powered applications and services becoming available to everyone in the near future.