Chinese artificial intelligence startup DeepSeek has unveiled its newest development, DeepSeek-R1, an open-source AI model that performs as well as the leading OpenAI o1 model in math, programming, and reasoning tasks, while costing 90-95% less, VentureBeat reports.
DeepSeek-R1, announced along with the publication of the model's weights on the Hugging Face platform under an MIT license, is a significant step forward for open source AI technologies and potentially levels the playing field in the global race to achieve artificial general intelligence (AGI). DeepSeek-R1, based on another DeepSeek V3 model, has shown strong results in tests:
- 79.8% in the AIME 2024 math competition (compared to 79.2% in o1);
- 97.3% on the MATH-500 test (96.4% in o1);
- A score of 2,029 on Codeforces, which is higher than 96.3% of human programmers.
Although OpenAI o1 slightly outperformed in general knowledge (91.8% accuracy vs. 90.8% for DeepSeek-R1 on the MMLU test), DeepSeek demonstrated strong capabilities in complex reasoning and programming, which is an important achievement for the Chinese AI sector.
At the same time, DeepSeek-R1 significantly reduces the cost of use. While OpenAI o1 costs $15 per million incoming tokens and $60 per million outgoing tokens, the DeepSeek Reasoner API based on the R1 model offers $0.55 per million incoming tokens and $2.19 per million outgoing tokens.
This price advantage, combined with comparable performance, can make DeepSeek an attractive choice for developers and businesses looking for effective AI tools.
The development of DeepSeek-R1 started with DeepSeek-R1-Zero, a model trained exclusively with reinforcement learning (RL). Through a process of trial and error, the model learned complex reasoning on its own, achieving 86.7% accuracy on the AIME 2024 benchmark, which is the same as OpenAI o1-0912.
However, early versions faced problems such as language mixing and poor readability. DeepSeek addressed these issues using a multi-stage approach:
- Fine-tuning the DeepSeek-V3 base model with the help of starting data;
- RL learning for reasoning tasks;
- Retraining using supervised data from different domains, such as actual questions and answers and self-awareness.
The result is an improved DeepSeek-R1 model that combines advanced reasoning capabilities with improved readability and functionality.
The release of DeepSeek-R1 generally shows an increase in the competitiveness of open source AI models. By distilling its technology into smaller models, DeepSeek has shown the ability to improve performance at scale. For example, the distilled Qwen-1.5B model outperformed the larger GPT-4o and Claude 3.5 Sonnet models in math tests.
In addition, the entire development process-including training data and methodologies-is open, which promotes transparency and collaboration in artificial intelligence research.
DeepSeek is showcasing R1 on its DeepThink platform, which resembles ChatGPT and also offers model weights, a code repository, and API integration.