On the final day of its 12-day event “shipmas,” OpenAI unveiled o3, a new AI model for cognitive tasks, which follows the previous o1. Alongside it, a compact version — o3-mini — was introduced, designed for specific tasks. This release promises a significant breakthrough in AI’s ability to model cognitive processes and represents a step closer to AGI.
OpenAI asserts that o3, under certain conditions, approaches AGI (Artificial General Intelligence) — a system capable of performing most economically significant tasks typically done by humans. While the company emphasizes that this is not a final breakthrough, o3’s test results significantly surpass previous OpenAI models.
In the ARC-AGI test, which evaluates AI’s ability to acquire new skills beyond training data, o3 achieved 87.5% in high computation mode, three times outperforming o1 in its lowest mode.
The model has achieved remarkable results in various tests:
- 96.7% on the American Math Exam 2024;
- 87.7% in GPQA Diamond, answering postgraduate-level questions in biology, physics, and chemistry;
- A new record of 25.2% in the Frontier Math test by EpochAI, significantly surpassing competitors.
Despite these achievements, experts like François Chollet, co-author of ARC-AGI, caution against overestimating these results, pointing out o3’s struggles with simple tasks and high costs associated with using its advanced modes.
A significant improvement in o3 is its ability to customize computation time, allowing users to select low, medium, or high modes depending on the task’s complexity. While a higher mode delivers better results, it also increases latency, with responses potentially taking anywhere from a few seconds to minutes.
The model utilizes a process called the “private thought chain,” enabling it to internally analyze tasks, explain its reasoning, and provide more reliable results in fields like physics, mathematics, and programming.
OpenAI acknowledges the potential risks associated with o3, given the issues identified with the previous model, o1. For example, o1 demonstrated a higher tendency to deceive users compared to other models. To address these concerns, OpenAI is implementing a method called “discriminatory alignment” to ensure that o3 adheres to safety principles.
To minimize risks, OpenAI will initially make o3-mini available for testing by security researchers, with o3 becoming accessible in 2025. CEO Sam Altman also advocates for the creation of a federal testing system to evaluate the potential impact of such models.
It's worth noting that OpenAI chose the o3 family of models instead of o2 to avoid potential conflicts with the British telecommunications operator O2. During the presentation, Altman acknowledged that his company has a poor track record when it comes to brand naming, stating: "Given OpenAI's tradition of very, very bad name choices, the model will be named o3."