Farewell to "Computing Power Islands": AI Training is Breaking Free from Centralized Shackles

CN
PANews
Follow
4 hours ago

Author: Egor Shulgin | Co-founder of Gonka Protocol, former AI algorithm engineer at Apple and Samsung

For years, the most powerful AI systems have been imprisoned in closed "black boxes"—those large data centers controlled by a handful of tech giants. Within these facilities, tens of thousands of GPUs are crammed into the same physical space, tightly connected through high-speed internal networks, allowing large models to be trained in highly synchronized systems.

This model has long been seen as a technical "inevitability." However, the reality is becoming increasingly clear: centralized data centers are not only costly and risk-laden, but are also reaching physical limits. The growth rate of large language models is exploding exponentially; systems trained just a few months ago already seem outdated. The question is no longer just whether "power is too centralized," but whether centralized infrastructure can keep up with the physical pace of AI evolution.

The Shadow Behind Prosperity: The Centralized "Physical Ceiling"

Today's cutting-edge models have squeezed every last bit of potential from top data centers. Training a more powerful model often means building a new machine room from scratch or undertaking a complete upgrade of existing facilities. Meanwhile, co-located data centers are facing the limits of power density—much of the energy is not consumed in computation but wasted in the cooling systems that prevent these silicon chips from burning out. The result is evident: the training capability of top AI models is locked within a handful of companies and is heavily concentrated in the U.S. and China.

This centralization is not only an engineering problem but also a strategic risk. Access to AI capabilities is heavily constrained by geopolitics, export controls, energy rationing, and corporate interests. As AI becomes the cornerstone of economic productivity, scientific research, and even national competitiveness, reliance on a few centralized hubs is turning infrastructure into the most vulnerable "Achilles' heel."

But what if this monopoly is not inevitable, but merely a "side effect" of our current training algorithms?

The Overlooked Communication Bottleneck: The Hidden Limitations of Centralized Training

Modern AI models, due to their immense size, can no longer be trained on a single machine. A foundational model with hundreds of billions of parameters requires countless GPUs to work in parallel, synchronizing progress every few seconds, resulting in millions of synchronizations throughout the training cycle.

The industry's default solution is "co-located training": stacking thousands of GPUs together and connecting them with specially designed expensive network hardware. This network ensures that each processor can align in real-time, ensuring the model copies are perfectly synchronized during training.

This solution is effective but comes with extremely stringent prerequisites: it requires high-speed internal networks, physical proximity, very stable power supply, and centralized operational control. Once training needs cross physical boundaries—across cities, national borders, and intercontinentally—the system falls apart. The connection speed of the ordinary internet is several orders of magnitude slower than that of a data center's internal network. Under the current algorithms, high-performance GPUs spend a large portion of the time in "idle" mode, waiting for synchronization signals. Estimates suggest that training modern large models using standard internet connections would extend training cycles from "months" to "centuries." This is why such attempts have been seen as fanciful in the past.

Paradigm Shift: When "Reducing Communication" Becomes the Core Algorithm

The core assumption of traditional training models is that machines must communicate after every tiny step in learning.

Fortunately, a technology called "Federated Learning" has brought a breakthrough from an unexpected direction. It introduces a highly disruptive idea: machines do not need to communicate all the time. They can work independently for longer periods, synchronizing only occasionally.

This insight has evolved into a broader technology known as "Federated Optimization," wherein "low communication frequency" solutions stand out. By allowing more local computation between synchronizations, it makes it possible to train models over distributed networks with low bandwidth across regions.

DiLoCo: A Dawn for Global Distributed Training

This technological leap has been materialized in the research and development of DiLoCo (Distributed Low Communication Training).

DiLoCo no longer demands real-time synchronization but allows each machine to conduct local training for extended periods before sharing updates. Experimental results are encouraging: models trained using DiLoCo achieve performance comparable to traditional highly synchronized modes, yet with reduced communication needs by hundreds of times.

Crucially, this makes training feasible outside controlled data centers. Open-source implementations have demonstrated that large language models can be trained through standard internet connections in a peer-to-peer (P2P) environment, completely free from dependence on centralized infrastructure.

This inspiration from DeepMind researchers has been adopted by organizations like Prime Intellect to train models of billions of parameters. What was once a research concept is evolving into a pragmatic path for building top-tier AI systems.

Industry Shifts: Redistribution of Computational Power

The transition from "centralized" to "distributed" has implications far beyond efficiency improvements.

If large models can be trained across the internet, AI development will no longer be a privilege of elite circles. Computational power can be contributed from around the world, provided by different participants in diverse environments. This means:

  • Large collaborations across borders and institutions become possible;

  • Reduced dependence on a few infrastructure providers;

  • Increased resilience against geopolitical and supply chain fluctuations;

  • A broader population can participate in building the foundational technologies of AI.

In this new model, the balance of power in AI is shifting from "who has the largest data center" to "who can most effectively coordinate global computational power."

Building Open and Verifiable AI Infrastructure

As training moves towards distributed processes, new challenges arise: trust and verification. In open networks, we must ensure that contributions of computational power are genuine, and that the models are not maliciously altered.

This has given rise to keen interest in cryptographic verification methods. Some emerging infrastructure projects are putting these concepts into practice. For example, Gonka—a decentralized network designed for AI inference, training, and verification. Gonka does not rely on centralized centers, but rather collaborates with independent participants' computational power, ensuring the authenticity and reliability of contributions through algorithmic verification.

This network perfectly aligns with the essence of "low communication training": reducing dependence on high-speed private infrastructure while emphasizing efficiency, openness, and resilience. In this context, decentralization is no longer an ideological label but rather an engineering necessity—because algorithms no longer need to synchronize all the time.

Another Path Forward

The history of AI training has always been constrained by the physical limits of communication. For years, progress has depended on reducing the physical distance between machines.

But the latest research tells us this is not the only way. By changing how machines collaborate—communicating less, not more—we can cultivate powerful models on the global internet.

As algorithms evolve, the future of AI may depend not on where computation resides, but on how they are intelligently connected. This shift will make AI development more open and resilient, ultimately freeing it from the chains of centralization.

免责声明:本文章仅代表作者个人观点,不代表本平台的立场和观点。本文章仅供信息分享,不构成对任何人的任何投资建议。用户与作者之间的任何争议,与本平台无关。如网页中刊载的文章或图片涉及侵权,请提供相关的权利证明和身份证明发送邮件到support@aicoin.com,本平台相关工作人员将会进行核查。

Share To
APP

X

Telegram

Facebook

Reddit

CopyLink