The following article is sourced from Titanium Media AGI, authored by Lin Zhijia.
As everyone knows, the 'mysterious power of the East' DeepSeek has recently received widespread attention in the tech circles of China and the U.S., even being regarded as the biggest 'dark horse' in the large model industry.
Recently, the Chinese AI large model startup DeepSeek officially released the DeepSeek-R1 large model, claiming performance on tasks such as mathematics, code, and natural language reasoning that rivals the official version of OpenAI o1.
This news has shaken the global AI community and surprised researchers at U.S. AI companies that China has actually overtaken the U.S. in large model technology.
An engineer at Meta wrote in the American tech company employee community, Blind, 'Meta's generative AI department is in a panic. It all started with DeepSeek, which has already fallen behind Llama 4 in benchmark tests. To make matters worse: that unknown Chinese company has only a $5.5 million training budget. Engineers are frantically dissecting DeepSeek and trying to replicate everything possible from it.'
Titanium Media AGI learned that as of the time of publication, DeepSeek's mobile app ranked eighth in the Apple App Store, surpassing U.S. generative AI products like Google Gemini and Microsoft Copilot, with download popularity only second to ChatGPT. At the same time, OpenAI, ByteDance, Alibaba Tongyi, and Zhipu, as well as domestic and foreign teams such as Kimi's Dark Side of the Moon, are actively researching DeepSeek. Both OpenAI and ByteDance are considering research cooperation with DeepSeek.
During the World Economic Forum in Davos, Scale AI founder Alexandr Wang bluntly stated that the performance of DeepSeek AI large models is roughly comparable to the best models in the U.S. He believes that while the U.S. may have led the AI race against China for the past decade, the release of DeepSeek's AI large model could 'change everything.'
Notably, Alexandr Wang said another phrase: 'DeepSeek has about 50,000 H100 computing cards, and they clearly cannot discuss this because it violates the export controls imposed by the U.S. I believe this is true; I think they have more chips than others expect, but they will continue to move forward. They will be subject to chip control and export restrictions.'
Titanium Media, praise 161
Alexandr Wang hinted that DeepSeek will be subject to U.S. regulations.
DeepSeek founder and head of quantitative private equity Huanshuo Quantitative, Liang Wenfeng, previously stated that the main constraint facing DeepSeek is not funding, but the right to use high-end computing power, which is crucial for training advanced AI models.
As AMD confirmed that DeepSeek is using one of the most powerful AI chips, MI300X, for large model training, how China can break through barriers to implement large model training will become a key topic.
DeepSeek
It took 4 years to take 'this shot' at Silicon Valley, USA.
If you are in the AI circle, there have been many articles introducing DeepSeek and Liang Wenfeng. In summary, there are several points:
Liang Wenfeng is a typical 'small-town exam taker': born in Zhanjiang, Guangdong, a third-tier city, he was admitted to Zhejiang University at the age of 17 and graduated with a master's degree in Information and Communication Engineering from Zhejiang University in 2010.
After graduating with a master's degree, Liang Wenfeng led a team to explore fully automated quantitative trading using machine learning and other technologies. In 2010, he and his Zhejiang University alumni founded Jacobi Investment.
In June 2015, 30-year-old Liang Wenfeng co-founded Hangzhou Huanshuo Technology Co., Ltd. (Huanshuo Quantitative, High-Flyer) with Xu Jin, 'the female subordinate of a shareholder who strayed,' relying on mathematics and artificial intelligence for quantitative investment, aiming to become the world's top quantitative hedge fund.
In 2021, the management scale of Huanshuo Quantitative had exceeded 100 billion yuan. In the same year, Liang Wenfeng began to seek a 'side business' and purchased thousands of NVIDIA GPU graphics cards from suppliers (likely including RTX 4090, A100, L40, etc.), focusing on AI technology. By 2023, the total management scale of Huanshuo Quantitative had dropped to over 40 billion yuan.
At the beginning of 2023, Huanshuo Quantitative announced that it owned 10,000 NVIDIA A100 GPU cards. Later, we learned that what Huanshuo Quantitative said at that time was a lie; it only owned several thousand A100 cards, while the rest were consumer cards, older graphics cards, and A100 cards rented through cloud services. Industry insiders viewed this as a 'quirky behavior' of a billionaire looking for new hobbies.
The surge of DeepSeek is largely inseparable from the domestic media's hype about 'Chinese large model companies surpassing the United States,' which reflects a narrative of East rising and West declining. In fact, DeepSeek's technology is not so extraordinary as to be 'astonishing'; the V1 version of DeepSeek was quite rough, heavily relying on open-source data from GPT, and even called the GPT-3.5 API at one point. Today’s 'Pinduoduo of AI' is inherently strong in AI infrastructure technology and team capabilities. Therefore, the media's use of DeepSeek to confirm that Chinese AI technology has surpassed the U.S. is a logical fallacy of 'generalizing from the particular.' DeepSeek is a beneficiary of AI technology iteration, but this does not mean it has the technical strength to surpass leading companies like OpenAI.
DeepSeek's instances further illustrate that there is no significant 'moat' for AI technology, and the surpassing of model technology has become the norm. The 'six small tigers' are not the only leaders. However, whether the growth of AI computing power and long-term model iterations can truly surpass OpenAI is the key factor determining the development of large AI models.
DeepSeek does not seek financing and has no plans to go public in the short term; its good cash flow encourages it to hire a large number of AI research talents, creating a so-called 'research institute' atmosphere, focusing solely on the cutting edge and not on business, with a team that is very knowledgeable about infrastructure and chip principles. Additionally, it has also recruited the best teams from the hedge fund industry to join DeepSeek.
As Turing Award winner and Meta AI Chief Scientist Yann LeCun said, 'For those who see DeepSeek's performance and think, 'China is surpassing the U.S. in AI,' your interpretation is wrong. The correct interpretation should be, 'Open-source models are surpassing proprietary models.'
In fact, the journey for DeepSeek to surpass OpenAI began with the purchase of thousands of GPUs to build AI computing power and took 4 years.
At the end of December last year, the performance of the DeepSeek-V3 open-source base model released by DeepSeek was comparable to top models like GPT-4o and Claude Sonnet 3.5, but the training cost was extremely low. The entire training was completed on a cluster of 2048 NVIDIA H800 GPUs, costing only about $5.576 million, less than one-tenth of the training cost of other top models.
The training cost of models like GPT-4o is approximately $100 million, trained on clusters of at least ten thousand GPUs, using more advanced H100 GPUs. For example, Llama 3.1, another top model released last year, used 16,384 H100 GPUs during training, consuming 11 times the computing resources of DeepSeek-V3, with costs exceeding $60 million.
Today, although DeepSeek has not yet disclosed the full cost of training the reasoning model R1, it has announced the pricing for its API, charging 1-4 RMB per million input tokens and 16 RMB per million output tokens. This fee is approximately one-thirtieth of OpenAI o1's operating cost.
As costs continue to decrease, the key technical point of DeepSeek R1 lies in its innovative training method—DeepSeek-R1-Zero route, which directly applies reinforcement learning (RL) to the basic model without relying on supervised fine-tuning (SFT) and labeled data. By establishing simple accuracy reward and format requirement rules, DeepSeek R1 achieves self-evolution in the absence of supervised data, gaining strong reasoning capabilities. In the AIME 2024 benchmark test, DeepSeek R1-Zero demonstrated an accuracy rate of up to 86.7%, proving the effectiveness of direct reinforcement learning in training advanced reasoning models.
Nathan Lambert, a scientist at the Allen Institute for Artificial Intelligence, stated that the R1 paper is an important turning point in the uncertainty of reasoning model research because, until now, AI reasoning models have been an important area of industrial research but lacked a groundbreaking paper.
According to intellectuals, Wang Meiqi, an assistant professor at Sun Yat-sen University's School of Integrated Circuits, stated that the direct reinforcement learning method, combined with a series of engineering optimization techniques in DeepSeek's multi-version model iteration (such as simplifying reward and punishment model design), effectively reduces the training cost of large models. Direct reinforcement learning avoids a large amount of manual labeling data work, while simplified designs of reward and punishment models reduce the demand for computing resources.
'The way DeepSeek operates is similar to early DeepMind,' stated an AI investor, 'it purely focuses on research and engineering, rather than commercialization.'
NVIDIA senior research scientist Jim Fan stated, 'DeepSeek is the biggest dark horse in the field of open-source large language models this year.'
The demand for computing power remains a 'dilemma' for large model resources.
U.S. export controls have a significant impact.
Regarding DeepSeek, the British journal (Nature) believes that despite the U.S. restrictions on semiconductor exports to China, Chinese companies have successfully produced DeepSeek R1. However, Seattle AI researcher Francois Chollet believes that 'efficient resource utilization is more important than sheer computational scale.'
Liang Wenfeng also pointed out that advanced AI chips with higher computing power are crucial for training advanced AI models.
Now, Alexander Wang openly expresses that the U.S. government needs to investigate and regulate DeepSeek's AI chips for competitive advantage.
Alexandr Wang was born in 1997 and dropped out of the Massachusetts Institute of Technology (MIT) at the age of 19 to found Scale AI, which is valued at over $10 billion. It has received investments from major tech companies including Y Combinator, NVIDIA, AMD Ventures, Amazon, and Meta. The company provides training data for OpenAI, Google, and Meta.
Previously, Alexandr Wang expressed concerns about China catching up to the U.S. in AI. He believes that the release of DeepSeek-V3 teaches outsiders that while Americans rest, the Chinese work hard, catching up with cheaper, faster, and stronger products.
OpenAI CFO Sarah Friar also believes that the AI competition between China and the U.S. is not a simple war of words; it is a real competition, and both sides are investing heavily in this field. 'We have seen the Trump administration willing to engage actively, whether from an economic perspective or from a regulatory and commercial competition perspective. We look forward to starting substantial cooperation.'
Currently, U.S. export controls have become one of the key factors in the development of China's AI industry.
On the evening of January 15, Beijing time, the U.S. Department of Commerce's Bureau of Industry and Security (BIS) amended the Export Administration Regulations (EAR), adding a total of 25 Chinese entities in two batches to the Entity List, including 9 entities under Zhipu.
This is the first Chinese AI large model company to be included in the 'Entity List' by the United States.
In response, Zhipu issued a statement saying, 'The U.S. Department of Commerce's Bureau of Industry and Security (BIS) intends to add Zhipu and its subsidiaries to the export control entity list. This decision lacks factual basis, and we strongly oppose it. Given the fact that Zhipu possesses core technology for large models across the entire chain, being listed as an entity will not materially affect the company's business. Zhipu has the capability and will focus more on providing our users and partners with world-class large model technology, products, and services. At the same time, the company will continue to participate in global AI competition, adhering to the highest security standards and principles of fairness, transparency, and sustainability, promoting the development of AI technology.'
Prior to this, numerous AI companies such as Megvii, Yitu, CloudWalk, and Moore Threads were placed on the U.S. 'Entity List,' which has affected some AI software companies, making it impossible to train trillion-scale large models.
However, the emergence and efforts of Chinese companies like DeepSeek and ByteDance in the AI field have made the U.S. realize that regulation cannot stop China from benchmarking against OpenAI and continuously advancing AI technology leadership.
(Forbes) pointed out that DeepSeek has made the world realize that 'China has not withdrawn from this (artificial intelligence) competition.'
'If the best open-source technology comes from China, American developers will build their systems based on these technologies. In the long run, this could make China the center of AI research and development,' stated The New York Times.
However, DeepSeek still faces challenges from competitors hoarding substantial computing power. This week, Trump announced that OpenAI, Oracle, and Japan's SoftBank Group jointly established a new investment plan company, 'Interstellar Gate,' with a $500 billion investment plan, immediately investing at least $100 billion in AI infrastructure in the U.S. Meanwhile, Musk's xAI is also massively expanding its supercomputers to accommodate over a million GPUs to help train its Grok AI models.
At this time, I recalled the words of Baidu founder and CEO Li Yanhong: 'Open-source models will gradually fall behind.'
It now appears that DeepSeek proves that open source has not fallen behind, and may even bring more hope for China's AI to surpass the United States. However, whether DeepSeek will face targeted restrictions from the U.S. government, ultimately leading to constraints in model training and computational power, remains highly uncertain.
'Currently, DeepSeek has one of the largest advanced computing clusters in China,' said Liang Wenfeng's business partner, 'They currently have sufficient resource capacity but it won't last long.'
(This article was first published on the Titanium Media App)