A Chinese artificial intelligence laboratory has done more than just building a cheaper artificial intelligence model, as it has revealed the inefficiency of the entire industry approach.
Deepseek's progress showed how a small team managed to make this in an attempt to save money Reflection on how to build artificial intelligence models. While spending technology giants such as Openai and Anthropic Several billion dollars Regarding the computer power alone, it is claimed that Deepseek has achieved similar results for a little more than $ 5 million.
The company model matches or outperforms GPT-4O (the best LLM in Openai), and Openai O1-the best inference model currently available for Openai-Anthropic's Claude 3.5 Sonnet in many standard tests, using approximately 2.788 million hours of H800 graphics processing units for training Complete. This is a very small part of the devices that are traditionally believed to be necessary.
The model is very good and effective, and it has climbed to the top of the Apple OS productivity category within days, which is a challenge to Openai's domination.
The need is the mother of innovation. The team was able to achieve this using technologies that American developers did not even need to take into account, and they are not to dominate today. Perhaps the most important is that instead of using full accuracy in calculations, Deepseek carried out 8 -bit training, which reduced memory requirements by 75%.
"They have reached a 8 -bit floating comma training, at least for some numbers," CEO of Perplexity Erfind Srinivas Say CNBC. "On my knowledge, I think the floating point 8 is not a good concept. Most training in America is still taking place in FP16."
FP8 uses a half -range frequency of memory and storage compared to FP16. For large artificial intelligence models that contain billions of parameters, this reduction is great. Deepseek needed to master this because its devices were weaker, but Openai never faced this restriction.
Deepseek has also developed a "multi -symbol" system that treats the entire phrases once instead of individual words, making the system twice faster while maintaining accuracy by 90%.
There is another technique that you used, which is the so -called "distillation" - a small model that repeats the outputs of a larger model without the need to train it on the same cognitive database. This made it possible to launch smaller models that are very efficient, accurate and competitive.
The company also used a technology called "Expert Mix", which increased the efficiency of the model. While traditional models maintain all their parameters are constantly active, Deepseek uses a total of 671 billion teachers but only 37 billion is active at one time. It is like a large team of specialists, but only using the experts needed for certain tasks.
"We use Deepseek-R1 as a model for the teacher to create 800,000 training samples, and set many dense small models. Promising results: Deepseeek-R1-Distill-Qwen-1.5B excels on GPT-4O and Claude-3.5-Sonnet in mathematics standards by 28.9% In Aime and 83.9% in Math, "Deepseek He wrote in his paper.
For context, 1.5 billion is a small amount of parameters for a model that is not considered LLM or a large language model, but rather a SLM model or a small language model. SLMS requires very little account and VRAM so that users can run on weak devices such as their smartphones.
The effects of the cost are amazing. In addition to the 95% reduction in training costs, the Deepseek applications interface is only 10 cents per million icons, compared to $ 4.40 for similar services. A developer reported Treating 200,000 requests for application programming interface For about 50 cents, with There is no limit to the price.
"Deepseek" has become already noticeable. The investor said: "Let me say the calm part loudly: Building artificial intelligence models is a trap of money." Shamath Baltiatia. Despite the strikes directed at Deepseek, the CEO of Openai, Sam Altman, was quick to curb his endeavor to pressure users to get money, after all the chants of social media about people who are investigating for free using Deepseek what Openai is receiving at $ 200 per month to do With it.
Meanwhile, the Deepseek app releases download charts, and three out of six Go Warehous Related to Deepseek.
Most of the artificial intelligence shares have decreased, as investors wonder whether the noise is at the bubble levels. Both artificial intelligence devices (NVIDIA, AMD) and software stocks (Microsoft, Meta and Google) suffer from the consequences of the clear transformation in the form caused by the Deepseek advertisement, and the results shared by users and developers.
until Cracks of artificial intelligence have received great successWith the appearance of a large number of distinguished fraudsters for Deepseek AI in an attempt to defraud objects.
Beyond financial debris, the summary of all of this is that the Deepseek penetration indicates that the development of artificial intelligence may not require huge data centers and specialized devices. This may radically change the competitive scene, and to convert what many consider permanent advantages to major technology companies into temporary customers.
Almost comic timing. A few days before the Deepseek announcement, President Trump revealed the Openai Medal of Openai and the founder of Oracle on the Stargate project - a project Investments worth $ 500 billion In the infrastructure of artificial intelligence in the United States. Meanwhile, Mark Zuckerberg doubled a dead commitment to that Pumping billions in developing artificial intelligenceSuddenly, Microsoft's investment of $ 13 billion in Openai seems less similar to strategic genius and more likely to fear of fear of expensive FOOO, which is nourished by resource waste.
"All I did to prevent them from catching the knees was not important." Serenivas He said CNBC. "They ended up joining the knees anyway."
Modified by Andrew Hayouard
In general Newsletter
A weekly journey of artificial intelligence narrated by Jin, the model of obstetric intelligence.