Sam Gao
Sam Gao|Jan 29, 2025 08:27
My view on DeepSeek (1/N) In recent months, the successive releases of DeepSeek V3 and R1 have sent American AI researchers, entrepreneurs, and investors into a frenzy. This phenomenon rivals the shockwaves caused by ChatGPT’s debut in late 2022.  With DeepSeek R1’s fully open-source model (freely downloadable on HuggingFace for local inference) and ultra-low pricing (1/100th the cost of OpenAI’s o1), DeepSeek soared to the top of the U.S. Apple AppStore within just five days.  But where did this mysterious AI powerhouse—incubated by a Chinese quantitative trading firm—originate? 1. The Origins of DeepSeek I first heard of DeepSeek in 2021 while working at Alibaba’s DAMO Academy. At the time, Luo Fuli, a prodigious researcher from a neighboring team (who published eight ACL papers in a single year as a Peking University master’s student), left to join High-Flyer Quant. Everyone wondered Why a highly profitable quant firm would recruit AI talent? Did they need academic papers?  Back then, High-Flyer’s AI researchers largely explored cutting-edge fields independently, with a focus on large language models (LLMs) and text-to-image models (like OpenAI’s DALL-E).   By late 2022, High-Flyer began recruiting top AI talent—mostly Tsinghua and Peking University students—spurred by ChatGPT’s success. High-Flyer’s CEO, Liang Wenfeng, decided to pivot to AGI: “We founded a new company starting with language models, followed by vision and more.”   That company was DeepSeek. In early 2023, as Chinese AI startups like Zhipu, Moonshot, and Baichuan dominated headlines, DeepSeek—lacking star founders like Kai-Fu Lee (http://01.AI) or Yang Zhilin (Moonshot)—struggled for attention in Beijing’s tech hubs.   DeepSeek faced fundraising challenges in 2023’s overheated market as a pure research entity with no celebrity backing. Venture capitalists hesitated: DeepSeek’s team consisted of fresh PhDs without big-name researchers, and ROI timelines were uncertain. High-Flyer ultimately spun off DeepSeek, funding it entirely in-house.   Amid the noise, DeepSeek began scripting its AI saga: • Nov 2023: Launched DeepSeek LLM (67B parameters), rivaling GPT-4. • May 2024: Released DeepSeek-V2. • Dec 2024: Debuted DeepSeek-V3, outperforming Llama 3.1 and Qwen 2.5 while matching GPT-4o and Claude 3.5 Sonnet. • Jan 2025: DeepSeek-R1—a cost-efficient reasoning model priced at <1% of OpenAI’s o1—shook the global tech world. The message was clear: “Open-source wins, and China has arrived.”
+4
Mentioned
Share To

Timeline

HotFlash

APP

X

Telegram

Facebook

Reddit

CopyLink

Hot Reads