Closed-door discussion among Chinese and American AI entrepreneurs: After DeepSeek-R1, changes and new trends in AI entrepreneurship

Chatbots may not necessarily be the first AI product for users.

Image source: Generated by Wujie AI

DeepSeek is undoubtedly the focus during the Spring Festival of 2025. From topping the free charts in the Apple Store to various cloud vendors rushing to deploy DeepSeek-R1, DeepSeek has even become the first AI product for many people. For entrepreneurs, discussions range from technical innovation points, analysis of training and inference costs, to the impact on the entire AI industry.

On February 2, Founder Park and Global Ready, a closed community under Geek Park, organized a closed-door discussion, inviting over 60 founders and technical experts from AI companies in Silicon Valley, China, London, Singapore, Japan, and other locations to conduct an in-depth exploration of the new technical directions and product trends triggered by DeepSeek from perspectives such as technological innovation, product implementation, and computing power shortages.

After desensitization, we have organized the key points from this closed-door discussion.

01 Where is DeepSeek's innovation?

DeepSeek released its V3 base model at the end of December, which is one of the most powerful open-source models in the industry, containing 37B active parameters, with a total parameter scale of 671B, making it a large MoE (Mixture of Experts) model.

The "Aha moment" of the R1 model released in January 2025 refers to the model's ability to exhibit a certain level of reflective capability during inference. For example, during problem-solving, the model may realize that a certain method is no longer applicable and adjust to a more effective method in the process. This reflective capability stems from reinforcement learning (RL).

R1 is DeepSeek's flagship model, and its inference capability is comparable to OpenAI's o1. The specific implementation can be summarized as follows: R1 uses two steps of reinforcement learning and two steps of supervised fine-tuning (SFT), where the first two steps of RL and SFT are mainly used to build a teacher model for data generation, guiding the third step of data generation. This model aims to become the most powerful inference model currently available.

The core innovation of the DeepSeek R1-Zero model lies in skipping the traditional fine-tuning (SFT) process and directly optimizing inference through reinforcement learning (RL). Additionally, using DeepSeek R1 as a teacher model to distill an open-source small to medium model (such as Qwen1.7B/7B/14B/32B) can significantly enhance the capabilities of the small model.
In terms of coding ability, DeepSeek's R1 is on par with OpenAI's newly released o3 mini, with o3 mini being slightly stronger overall. The difference is that R1 is open-source, which will encourage more application developers to use R1.
The key to DeepSeek's success lies in using a highly integrated engineering solution to reduce costs. When breaking down their methods, each can be found in last year's papers, but DeepSeek aggressively employs the latest methods. These methods may have side effects and incur additional storage overhead, but they greatly enhance the reduction of cluster idle rates.
If it is not a large-scale cluster model serving a large number of users, the MLA architecture may have side effects. Many of DeepSeek's methods may not achieve maximum performance optimization if not applied in specific scenarios and environments; using these technologies in isolation may even have adverse effects. Their system design is very sophisticated, to the point that extracting these technologies individually would not yield the same results.
One should not only train a process reward model, as doing so may not achieve the desired outcome and could even lead to overfitting. DeepSeek chose the most primitive reinforcement learning method, scoring the final results through heuristic rules, and then using traditional reinforcement learning methods to correct the process. This method was developed through continuous trial and error, thanks to DeepSeek's highly efficient infrastructure.
Even if DeepSeek has not made its inference code public, other teams can roughly deduce the methods used. The open-source model weights are sufficient for other teams to replicate its performance, but the challenge lies in figuring out some special configurations, which takes time.
Relying solely on data-labeled reward models makes it difficult to achieve superhuman intelligence. A real reward model based on real data or real environmental feedback is needed to achieve higher-level reward optimization, thus generating superhuman intelligence capabilities.
From a technical perspective: if the base model itself has strong generality, combined with mathematical and coding capabilities, the combination of these two parts will produce stronger generalization abilities. For example, if there is a relatively intelligent base model that is already good at writing, then combining it with some reinforcement learning in mathematics and coding may lead to good generalization, ultimately producing very strong capabilities. This is manifested in its ability to write various genres of works, from parallel prose to regulated verse, while other models may not perform as well in this regard.

02 Why is DeepSeek's cost so low?

The model's sparsity is very high. Although this is a model with over 600B parameters, the actual active parameters during inference for each token are very small, only 37B, meaning its speed and resource consumption during inference are equivalent to a 37B parameter model. However, achieving this requires significant design changes to the entire system.
In DeepSeek V3, the MoE architecture includes 256 expert modules, but only a small portion of them are activated during each inference. Under high load, it can dynamically adjust resource usage, theoretically compressing costs to 1/256 of the original. This design reflects DeepSeek's foresight in software architecture. If system optimization is done well enough, prices can be significantly reduced at the same scale.
Model training generally involves three axes, meaning parallel splitting across three dimensions. The first is data-level splitting, known as Data Parallelism. The second is model-level splitting, as the layers of the model are independent of each other, known as Pipeline Parallelism. The third is weight splitting of the model, distributed across different GPUs, known as Tensor Parallelism. To accommodate the sparse model design, DeepSeek made significant adjustments to the training framework and pipeline, discarding Tensor Parallelism during training and only using Data Parallelism and Pipeline Parallelism, while further refining expert parallelism (Expert Parallelism). By finely dividing the number of experts (up to 256), different experts are allocated to different GPUs. Additionally, DeepSeek abandoned Tensor Parallelism, allowing it to bypass hardware limitations, making H800 and H100 close in training efficiency.
In terms of model deployment, experiments show that its computing cost is controllable, and the technical difficulty is not high, usually taking only one to two weeks to complete replication, which is very beneficial for many application developers.
A possible model architecture: allowing reasoning RL to no longer be limited to the large language model itself, but adding an external thinking machine to complete the overall reasoning capability, thus significantly reducing overall costs.

03 Chatbots may not necessarily be the first AI product for users

The success of DeepSeek R1 lies not only in its inference capabilities but also in its combination of search functions. The reasoning model + search is somewhat equivalent to a micro-agent framework. For most users, this is their first experience with a reasoning model. For users who have already used other reasoning models (such as OpenAI's o1), the combination of search functions with DeepSeek R1 offers a completely new experience.
For users who have not used AI products, their first AI product may not necessarily be a language interaction product like ChatGPT, but rather a product in another scenario driven by the model.
The competitive barrier for application-based companies in the AI field lies in product experience. Whoever can do it faster and better, providing features that users find more comfortable, will gain a competitive advantage in the market.
Currently, being able to observe the model's thought process is a satisfying design, but it is more like an early effort to enhance model capabilities using reinforcement learning (RL). The length of the reasoning process is not the only standard for measuring the correctness of the final result; in the future, there will be a shift from complex long reasoning processes to more concise short reasoning processes.

04 Vertical scene AI implementation has become easier

For relatively vertical tasks, task evaluation can be completed through a rule system, without relying on complex rewarding models. For well-defined vertical tasks, models like Tiny Zero or 7B can quickly yield usable results.
In a well-defined vertical task, training a distilled model with 7 billion parameters or larger using DeepSeek can quickly lead to an "aha moment." From a cost perspective, performing simple arithmetic tasks or tasks with clear answers, like 21 points, on a 7B model requires only 2-4 H100 or H200 GPUs, and the model can converge to a usable state in less than half a day.
In vertical fields, especially when dealing with tasks that have clear answers, such as mathematical calculations or physical rule judgments (like object placement or motion compliance), DeepSeek R1 indeed performs better than other models and is cost-effective, making it applicable in a wide range of vertical fields. However, for tasks without clear answers, such as judging whether something is aesthetically pleasing or whether an answer is satisfying, this subjectivity cannot be effectively resolved through rule-based methods. This may require waiting three months to half a year until better methods emerge to address these issues.
When using supervised fine-tuning (SFT) or similar methods, it is challenging to resolve time-consuming dataset queries, and the domain distribution of these datasets often fails to comprehensively cover all levels of the task. Now, with a new and better toolkit equipped with a high-quality model, it is possible to address past difficulties in data collection and clear-answer vertical tasks.
Relying solely on a rule system, while mathematics and code can define relatively clear rules, becomes very difficult when facing more complex or open-ended tasks. Therefore, everyone may ultimately explore more suitable models to evaluate the results of these complex scenarios. Approaches may adopt ORM (Outcome-oriented Reward Function) instead of PRM (Process-oriented Reward Function) or explore other similar methods. Ultimately, it may lead to the construction of simulators similar to "world models" that provide better feedback for decision-making across various models.
When training reasoning capabilities with small models, there is even no need to rely on token-based solutions. In a solution for a certain e-commerce direction, the entire reasoning capability can be directly separated from the Transformer-based model, using another small model to complete all reasoning tasks, combined with the Transformer to achieve the overall task.
For companies that develop models for their own use (like hedge funds), the challenge lies in cost issues. Large companies can spread costs by acquiring clients, but small teams or companies find it difficult to bear high R&D costs. The open-source nature of DeepSeek is significant for them, as it allows teams that previously could not afford high R&D costs to now build models.
In the financial sector, especially in quantitative funds, there is often a need to analyze large amounts of financial data, such as company financial reports and Bloomberg data. These companies typically build their own datasets and conduct supervised training, but the cost of data labeling is very high. For these companies, applying reinforcement learning (RL) during the fine-tuning phase can significantly enhance model performance, achieving a qualitative leap.

05 Domestic chips are expected to solve inference computing power issues

There are quite a few domestic chips that benchmark against A100 and A800, but the biggest bottleneck for domestic chips lies not in chip design but in wafer fabrication. DeepSeek's adaptation to Huawei is also because the latter can relatively stably produce chips, ensuring stable training and inference even under stricter sanctions.
Looking at NVIDIA's future development from the perspective of single-card training, these high-end chips may have excess computing power in certain application scenarios. For example, the computing power of a single card may not be fully utilized during the training phase due to additional cache and memory limitations, making it not the most suitable for training tasks.
In the domestic chip market, if it focuses entirely on AI applications without considering scientific computing, significantly reducing high-precision floating-point computing capabilities and concentrating solely on AI tasks could allow it to catch up with NVIDIA's flagship chips in certain performance metrics.

06 More powerful agents and cross-application invocation capabilities

For many vertical fields, the capabilities of agents will see significant improvements. One can start with a base model, turning some rules into a rule model, which may be a purely engineering solution. Then, this engineering solution can be used to iterate and train the base model. You may achieve a result that exhibits some superhuman intelligence capabilities. Building on this, further preference tuning can make its responses more human-readable, potentially resulting in a more powerful reasoning agent in a specific vertical field.
This may lead to a problem where you may not have an agent with strong generalization capabilities across all vertical fields. An agent trained in a specific field can only operate within that field and cannot generalize to other verticals. However, this is a possible (implementable) direction, as the inference cost brought by DeepSeek is very low, allowing for the selection of a model followed by a series of reinforcement training. Once trained, it only serves a specific vertical field, no longer concerned with others. For vertical AI companies, this is an acceptable solution.
From an academic perspective, an important trend in the coming year is that some existing methods in reinforcement learning will be transferred to the application of large models to address current issues of insufficient generalization or inaccurate evaluation. This approach can further enhance model performance and generalization capabilities. With the application of reinforcement learning, the ability to output structured information will greatly improve, ultimately better supporting various application scenarios, especially enhancing the generation of charts and other structured content.
More and more people can use R1 for post-training, allowing everyone to create their own agents. The model layer will transform into different agent models, using different tools to solve problems in various fields, ultimately achieving a multi-agent system.
The year 2025 may become the year of agents, with many companies launching agents capable of planning tasks. However, there is currently a lack of sufficient data to support these tasks. For example, planning tasks may include helping users order takeout, book travel, or assess the availability of tickets for attractions. These tasks require a large amount of data and reward mechanisms to evaluate the accuracy of the model, such as how to judge correctness and errors in planning a trip to Zhangjiajie and how to facilitate model learning. These issues will become the next research hotspots, with reasoning capabilities ultimately used to solve practical problems.
In 2025, the ability for cross-application invocation will become a hot topic. In the Android system, due to its open-source nature, developers can achieve cross-application operations through underlying permissions, allowing agents to control your browser, phone, computer, and other devices in the future. However, in the Apple ecosystem, due to strict permission management, agents face significant challenges in fully controlling all applications on devices, necessitating Apple to develop its own agents capable of controlling all applications. Although the Android system is open-source, it still requires collaboration with manufacturers like OPPO and Huawei to achieve the opening of underlying permissions on devices such as phones, tablets, and computers, thereby obtaining data and supporting the development of agents.

免责声明：本文章仅代表作者个人观点，不代表本平台的立场和观点。本文章仅供信息分享，不构成对任何人的任何投资建议。用户与作者之间的任何争议，与本平台无关。如网页中刊载的文章或图片涉及侵权，请提供相关的权利证明和身份证明发送邮件到support@aicoin.com，本平台相关工作人员将会进行核查。