DeepSeek's surprisingly inexpensive AI model challenges industry giants. The company claims to have trained its powerful DeepSeek V3 neural network for a mere $6 million, utilizing only 2048 GPUs, a stark contrast to competitors' significantly higher costs. However, this figure is misleading.
DeepSeek's self-introduction: "Hi, I was created so you can ask anything and get an answer that might even surprise you," hints at the model's capabilities, which have caused a major dip in NVIDIA's stock price. The model's success stems from innovative technologies:
- Multi-token Prediction (MTP): Predicts multiple words simultaneously, boosting accuracy and efficiency.
- Mixture of Experts (MoE): Employs 256 neural networks, activating eight for each token, accelerating training and performance.
- Multi-head Latent Attention (MLA): Repeatedly extracts key details, minimizing information loss and enhancing nuance understanding.
Image: ensigame.com
Despite the low training cost claim, a SemiAnalysis report reveals DeepSeek's substantial infrastructure: approximately 50,000 Nvidia Hopper GPUs (including H800, H100, and H20 units) spread across multiple data centers, costing around $1.6 billion. Operational expenses are estimated at $944 million.
Image: ensigame.com
DeepSeek, a subsidiary of High-Flyer, a Chinese hedge fund, owns its data centers, unlike cloud-reliant competitors. This provides greater control and faster innovation. The company's self-funding fosters agility. High salaries (over $1.3 million annually for some researchers) attract top Chinese talent, excluding foreign specialists.
Image: ensigame.com
The $6 million figure only covers pre-training GPU usage, excluding research, refinement, data processing, and infrastructure. DeepSeek's total AI investment exceeds $500 million. Its lean structure facilitates efficient innovation.
Image: ensigame.com
DeepSeek's success demonstrates a well-funded independent AI company's ability to compete with established players. However, its "budget-friendly" claim is exaggerated; billions in investment, technical breakthroughs, and a strong team are key factors. The contrast is stark: DeepSeek's R1 cost $5 million, while ChatGPT-4 cost $100 million, highlighting the significant cost difference. Despite the inflated claim, DeepSeek's cost remains substantially lower than its competitors.