Where DeepSeek came from and who is behind the AI lab that shocked Silicon Valley

TOPSHOT-CHINA-TECHNOLOGY-AI-DEEPSEEK

Тарас Міщенко Головний редактор Mezha.Media. Тарас має понад 15 років досвіду в IT-журналістиці, пише про нові технології та ґаджети.

28 January, 09:56 AM

A new artificial intelligence model DeepSeek-R1 from the Chinese laboratory DeepSeek appeared as if from nowhere. For the general public, the first mentions of it began to appear in the media only last week, and now it seems that everyone is talking about DeepSeek. Moreover, in just a week, the DeepSeek app has overtaken the well-known ChatGPT in the US App Store rankings. The model has also skyrocketed to the top downloadson the Hugging Face developer platform, asdevelopers are rushing to try it out and understand what this release can bring to their AI projects. So, logical questions arise: where did DeepSeek come from, who is behind this startup, and why has it made so much noise. I will try to answer them in this article.

Where DeepSeek came from

Given the history of Chinese tech companies, DeepSeek should have been a project of giants like Baidu, Alibaba, or ByteDance. But this AI lab was launched in 2023 by High-Flyer, a Chinese hedge fund founded in 2015 by entrepreneur Liang Wenfeng. He made a fortune using AI and algorithms to identify patterns that could affect stock prices. The hedge fund quickly gained popularity in China, and was able to raise more than 100 billion yuan (about $15 billion). Since 2021, this figure has dropped to about $8 billion, but High-Flyer is still one of the most important hedge funds in the country.

As High-Flyer's core business overlapped with the development of AI models, the hedge fund accumulated GPUs over the years and created Fire-Flyer supercomputers to analyze financial data. In the wake of the growing popularity of ChatGPT, a chatbot from the American company OpenAI, Liang, who also holds a master's degree in computer science, decided in 2023 to invest his fund's resources in a new company called DeepSeek, which was to create its own advanced models and develop general artificial intelligence (AGI).

Liang told Chinese tech publication 36Kr that the decision was motivated by scientific curiosity, not a desire to make a profit. "I couldn't find a commercial reason to start DeepSeek even if you asked me," he said. "Because it's not commercially viable. Basic research has a very low return on investment. When OpenAI's early investors gave it money, they probably didn't think about the return they would get. Rather, they really wanted to do this business."

According to Liang, when he assembled DeepSeek's R&D team, he also didn't look for experienced engineers to build a consumer-facing product. Instead, he focused on doctoral students from top universities in China, including Peking University, Tsinghua University, and Beihang University, who were eager to prove themselves. Many of them had published in top journals and won awards at international academic conferences, but had no industry experience, according to Chinese technology publication QBitAI.

"Our main technical positions are mostly filled by people who graduated this year or within the last one or two years," Liang said in an interview in 2023. He believes that students may be better suited for high-investment, low-return research. "Most people, when they are young, can fully commit to a mission without utilitarian considerations," Liang explained. His pitch to potential employees is that DeepSeek was created to "solve the world's toughest questions."

Liang, who is personally involved in DeepSeek's development, uses the proceeds from his hedge fund to pay high salaries to top AI talent. Along with TikTok owner ByteDance, DeepSeek is known in China for providing top compensation to AI engineers, and staff are based in offices in Hangzhou and Beijing.

Liang positions DeepSeek as a uniquely "local" company, staffed by PhDs from leading Chinese universities. In an interview with the domestic press last year, he said that his core team "didn't have any people who came back from abroad. They are all local... We have to develop the best talent ourselves." DeepSeek's identity as a purely Chinese LLM company has earned it popularity at home, as this approach is fully in line with Chinese government policy.

This week, Liang was the only representative of China's AI industry chosen to participate in a highly publicized meeting of entrepreneurs with the country's second-in-command, Li Qiang. Entrepreneurs were told to "focus on breakthroughs in key technologies."

Not much is known about how DeepSeek started building its own large language models (LLMs), but the lab quickly opened their source code, and it is likely that, like many Chinese AI developers, it relied on open source projects created by Meta, such as the Llama model and the Pytorch machine learning library. At the same time, DeepSeek's particular focus on research makes it a dangerous competitor for OpenAI, Meta, and Google, as the AI lab is, at least for now, willing to share its discoveries rather than protect them for commercial gain. DeepSeek has not raised funds from outside and has not yet taken significant steps to monetize its models. However, it is not known for certain whether the Chinese government is involved in financing the company.

What makes the DeepSeek-R1 AI model unique

In November, DeepSeek first announced that it had achieved performance that surpassed the leading-edge OpenAI o1 model, but at the time it only released a limited R1-lite-preview model. With the release of the full DeepSeek-R1 model last week and the accompanying white paper, the company introduced a surprising innovation: a deliberate departure from the traditional supervised fine-tuning (SFT) process that is widely used for training large language models (LLMs).

SFT is a standard approach for AI development and involves training models on prepared datasets to teach them step-by-step reasoning, often referred to as a chain of thought (CoT). However, DeepSeek challenged this assumption by skipping SFT entirely and instead relying on reinforcement learning (RL) to train DeepSeek-R1.

According to Jeffrey Emanuel, a serial investor and CEO of blockchain company Pastel Network, DeepSeek managed to outpace Anthropic in the application of the chain of thought (CoT), and now they are practically the only ones, apart from OpenAI, who have made this technology work on a large scale.

At the same time, unlike OpenAI, which is incredibly secretive about how these models actually work at a low level and does not provide the actual model weights to anyone other than partners like Microsoft, these DeepSeek models are completely open and permissively licensed. They have released extremely detailed technical reports explaining how the models work, as well as code that anyone can look at and try to copy.

With R1, DeepSeek essentially cracked one of the holy grails of AI: getting models to reason step by step without relying on massive teacher datasets. Their DeepSeek-R1-Zero experiment showed something remarkable: using pure reinforcement learning with carefully designed reward functions, the researchers were able to get the models to develop complex reasoning capabilities completely autonomously. It wasn't just problem solving-the model organically learned to generate long chains of thought, check its own work, and allocate more computational time to more complex problems.

In this way, the model learned to revise its thinking on its own. What is particularly interesting is that during training, DeepSeek observed what they called an "aha moment," a phase when the model spontaneously learned to revise its chain of thought mid-process when faced with uncertainty. This sudden behavior was not explicitly programmed, but arose naturally from the interaction between the model and the reinforcement learning environment. The model literally stopped itself, flagged potential problems in its reasoning, and restarted with a different approach, all without being explicitly trained to do so.

DeepSeek also solved one of the main problems in reasoning models: language consistency. Previous attempts at chain-of-thought reasoning often resulted in models mixing languages or producing incoherent output. DeepSeek solved this problem by smartly rewarding language consistency during RL training, sacrificing a slight performance hit for a much more readable and consistent output.

As a result, DeepSeek-R1 achieves high accuracy and efficiency. At AIME 2024, one of the toughest math competitions for high school students, R1 achieved 79.8% accuracy, which is in line with OpenAI's o1 model. At MATH-500, it reached 97.3%, and at the Codeforces programming competition, it reached the 96.3 percentile. But perhaps most impressively, DeepSeek was able to distill these capabilities down to much smaller models: their 14 billion-parameter version outperforms many models several times its size, showing that reasoning power depends not only on the number of parameters but also on how you train the model to process information.

However, the uniqueness of DeepSeek-R1 lies not only in the new approach to model training, but also in the fact that it is the first time a Chinese AI model has gained such great popularity in the West. Users, of course, immediately went to ask it questions about Tiananmen Square and Taiwan that were sensitive to the Chinese government, and quickly realized that DeepSeek was censored. Indeed, it would be futile to expect a Chinese AI lab to not comply with Chinese law or policy.

However, many developers consider this censorship to be an infrequent extreme case in real-world use that can be mitigated by fine-tuning. Therefore, it is unlikely that the issue of ethical use of DeepSeek-R1 will stop many developers and users who want to get access to the latest AI development and essentially for free.

Of course, for many, the security of the data remains a question mark, as DeepSeek-R1 probably stores it on Chinese servers. But as a precautionary measure, you can try the model on Hugging Face in sandbox mode, or even run it locally on your PC if you have the necessary hardware. In such cases, the model will not be fully functional, but it will remove the issue of data transfer to Chinese servers.

How much did it cost to develop DeepSeek-R1?

To train its models, the High-Flyer hedge fund purchased more than 10,000 NVIDIA H100 GPUs before the US export restrictions were introduced in 2022. Billionaire and Scale AI CEO Alexander Wang recently told CNBC that he estimates that DeepSeek now has about 50,000 NVIDIA H100 chips that they cannot talk about precisely because of US export controls. If this estimate is correct, then compared to the leading companies in the AI industry, such as OpenAI, Google, and Anthropic, this is very small. After all, each of them has more than 500,000 GPUs.

According to NVIDIA engineer Jim Fan, DeepSeek trained its base model, called V3, with a budget of $5.58 million over two months. However, it is difficult to estimate the total cost of training DeepSeek-R1. The use of 60,000 NVIDIA GPUs could potentially cost hundreds of millions of dollars, so the exact figures remain speculative.

Why DeepSeek-R1 shocked Silicon Valley

DeepSeek largely disrupts the business model of OpenAI and other Western companies working on their own closed AI models. After all, DeepSeek-R1 not only performs better than the best open-source alternative, Llama 3 by Meta. The model transparently shows the entire chain of thought in its answers. This is a blow to the reputation of OpenAI, which has hitherto hidden the thought chains of its models, citing trade secrets and the fact that it does not want to embarrass users when the model is wrong.

In addition, DeepSeek's success emphasizes that cost-effective and efficient AI development methods are realistic. We have already determined that in the case of a Chinese company, it is difficult to calculate the cost of development, and there may always be "surprises" in the form of multi-billion dollar government funding. But at the moment, DeepSeek-R1, with a similar level of accuracy to OpenAI o1, is much cheaper for developers. While OpenAI o1 costs $15 per million incoming tokens and $60 per million outgoing tokens, the DeepSeek Reasoner API based on the R1 model offers $0.55 per million incoming tokens and $2.19 per million outgoing tokens.

DeepSeek's offerings are likely to continue to lower the cost of using AI models, which will benefit not only ordinary users but also startups and other businesses interested in AI. But if developing a DeepSeek-R1 model with fewer resources does turn out to be a reality, it could be a problem for AI companies that have invested heavily in their own infrastructure. In particular, years of operating and capital expenditures by OpenAI and others could be wasted.

The market doesn't yet know the final answer to whether AI development will indeed require less computing power in the future, but it is already reacting nervouslywith a drop in shares of NVIDIA and other suppliers of AI data center components. This also calls into question the feasibility of the Stargate project, an initiative under which OpenAI, Oracle, and SoftBank promise to build next-generation AI data centers in the United States, allegedly willing to spend up to $500 billion.

But on the other hand, while American companies will still have excess capacity for the development of artificial intelligence, China's DeepSeek, with the US export restrictions on chips still in place, may face a severe shortage. If we assume that resource constraints have indeed pushed it to innovate and allowed it to create a competitive product, the lack of computing power will simply prevent it from scaling, while competitors will catch up. Therefore, despite all the innovation of DeepSeek, it is still too early to say that Chinese companies will be able to compete with Western AI tech giants, even if we put aside the issues of censorship and data security.