AMD releases a small AI host, directly targeting NVIDIA DGX Spark.

In June 2026, AMD confirmed a new device shipping plan at the AI DevDay in San Francisco. This machine is about the same size as the Apple Mac mini, equipped with 128GB of unified memory, officially positioned as a local AI development platform. Just months ago, NVIDIA's DGX Spark had already appeared on developers' desktops, also a palm-sized metal box, also featuring 128GB of unified memory, and also claiming to run a model with 200 billion parameters locally.

AMD Ryzen AI Halo mini PC product image

AMD Ryzen AI Halo developer platform, equipped with Ryzen AI Max+ 395 processor

Tom's Hardware provided reference pricing for AMD's camp based on testing the HP Z2 Mini G1a: $2,949 to $3,999. NVIDIA's official site lists the DGX Spark starting at $3,999, and there were discussions about some OEM versions increasing to $4,679 in February 2026. AMD is ahead on price, but that's just surface-level accounting.

The Same 128GB but Two Different Paths

The core of the AMD Ryzen AI Halo is a Ryzen AI Max+ 395 processor, with 16 Zen 5 cores, 40 GPU computing units based on RDNA 3.5 architecture, and next to it, a 50 TOPS XDNA 2 NPU. NVIDIA's official hardware documentation describes DGX Spark with a different set of logic: the GB10 Grace Blackwell Superchip, a 20-core ARM CPU paired with Blackwell architecture GPUs, lacking an NPU but featuring a ConnectX-7 200Gbps network card. AMD's device offers a 2.5GbE network port and WiFi 7; NVIDIA provides 10GbE, WiFi 7, and that high-priced network card.

Memory specifications are superficially similar. Both sides feature 128GB LPDDR5x. AMD's product page lists memory bandwidth at 256 GB/s, while NVIDIA states 273 GB/s. The difference is less than 7%, virtually imperceptible in most inference tasks.

Operating system choices expose more fundamental differences between the two companies. The AMD Ryzen AI Halo comes pre-installed with Windows 11 Pro, with an optional Ubuntu 24.04. Booting up leads to a standard PC desktop, featuring Thunderbolt ports and full peripheral support. DGX Spark runs DGX OS, customized based on Ubuntu, where the first task after booting is to configure the CUDA environment and NVIDIA container toolchains.

The Register conducted a detailed test comparison in December 2025. The conclusion was: during batch processing for large language models, the token generation speed of the two machines was very close. However, during the prompt processing phase, DGX Spark was 2 to 3 times faster. This difference arises from Blackwell architecture's support for low-precision computing, as well as NVIDIA's years of code path optimization in the inference pipeline. ServeTheHome's evaluation highlighted another dimension: the ConnectX-7 network card for DGX Spark is priced over $900 individually, and its potential value in multi-machine cluster scenarios far exceeds single machine inference.

According to tests by Tom's Hardware and others, the size of the Ryzen AI Halo is 85mm high, 168mm wide, 200mm deep, weighing 2.3 kilograms, which is closer to a traditional mini workstation's physique. NVIDIA's official documentation shows that DGX Spark is 150mm square, 50.5mm thick, and weighs 1.2 kilograms. One resembles a stacked hard drive enclosure, while the other looks like a router.

ROCm's Progress Bar, No Longer Just "Working is Fine"

AMD's official release notes indicate that ROCm 7.2 went live in January 2026, with the subsequent 7.2.4 version specifically optimizing for stability and performance in AI inference workloads. Phoronix provided detailed coverage on the release day.

For developers in the Linux environment, the installation process for ROCm is now much simpler than it was two years ago. In March 2026, tech blogger Kunal Ganglani wrote in a detailed ROCm usage guide that he completed the entire process from system configuration to running a PyTorch model on an RX 7900 XTX in about 30 minutes, "whereas in 2024, doing the same thing required half a day of hassle." His blog confirmed that ROCm currently supports the four mainstream deep learning frameworks: PyTorch, TensorFlow, JAX, and DGL, with vLLM, Ollama, and llama.cpp all having ROCm backends available.

But these advancements cannot withstand CUDA's inertia. NVIDIA's software stack has accumulated over 17 years, and the number of CUDA-related Q&A on Stack Overflow is several dozen times that of ROCm. New versions of cutting-edge libraries like FlashAttention and xFormers typically first release in CUDA, with the ROCm port taking weeks to months. Any custom CUDA kernel beyond the standard PyTorch API scope needs manual adaptation on AMD platforms. AMD's official compatibility matrix lists verified frameworks and GPU combinations, but being "verified" and having "enough community discussion threads available when issues arise" are two different things.

On the Reddit r/LocalLLaMA board, discussions about which device to choose have been ongoing since the end of 2025. The most frequently quoted summary comes from the end of Ganglani's blog: "If you need everything to run perfectly on day one, buy NVIDIA. If you're willing to spend an afternoon troubleshooting to save $800, ROCm is ready."

AMD seems to be aware of this. Over the past year, the company's actions have not been about directly copying NVIDIA's moat but rather starting anew outside of it.

In August 2024, AMD announced the acquisition of ZT Systems for $4.9 billion. The Wall Street Journal confirmed the transaction's completion in March 2025. ZT Systems' business involves helping hyper-scale data center customers design and assemble rack-level AI server systems, with clients like Microsoft and Meta, both of which purchase tens of thousands of GPUs annually. AMD acquired the capability to design systems from a single GPU to entire racks.

However, AMD quickly made what seemed a contradictory decision. In May 2025, according to a Sanmina official announcement, AMD spun off ZT Systems' data center manufacturing business to the electronics manufacturing services provider, retaining only the design team. The logic is clear: AMD does not want to become a competitor to its own OEM customers. If AMD itself manufactures AI servers, those manufacturers selling AMD graphics cards will immediately be alerted. Retaining design capability while outsourcing manufacturing balances capability enhancement and ecosystem relationships.

The more crucial two developments occurred in the subsequent six months.

In October 2025, AMD's official press release announced a strategic partnership with OpenAI to deploy 6GW of AMD Instinct GPUs. The first batch of 1GW is scheduled for shipment in the second half of 2026. Hidden within this agreement is a clause: OpenAI may opt to purchase up to 10% of AMD's shares. Both Reuters and CNBC emphasized this detail in their reports that day. The next generation of Instinct GPUs, which AMD did not specify, will supply OpenAI.

In February 2026, AMD again released an official press release announcing an expanded partnership with Meta, deploying 6GW of GPUs as well. This time, the chips are custom MI450 variants for Meta, with shipments planned to start in the second half of 2026. CNBC's report that day highlighted a detail: just days before this collaboration was made public, Meta also announced an expanded AI chip procurement agreement with NVIDIA.

Meta signing long-term orders with both companies is more persuasive than any technical comparison. For enterprises investing hundreds of billions of dollars annually in AI infrastructure, putting all eggs in one basket is an unacceptable risk. AMD does not need to completely outperform NVIDIA; it only needs to provide a usable option outside of NVIDIA to secure orders under a “dual supplier” logic. The scale of the two 6GW contracts suggests that at least OpenAI and Meta have added AMD to their lists.

NVIDIA’s Response During the Same Period is a Combination of Tactics

During the same time frame, NVIDIA was utilizing a combination of tactics in the enterprise market. DGX Spark is positioned as a developer desktop device, but its ConnectX-7 network card indicates it is not an isolated workstation. ServeTheHome's review detailed the value of this network card in prototype verification and distributed training debugging, concluding that although it is much slower than data center-level NVLink, it is sufficient for small-scale cluster scenarios. This design anchors DGX Spark within NVIDIA's larger enterprise product line: developers use Spark for prototyping before migrating code to DGX Station or cloud DGX instances, culminating in deployment to server clusters outfitted with H200 or B200. A seamless toolchain from desktop to data center, consistent in software and hardware, is firmly entrenched in CUDA.

During the same period, NVIDIA also launched the AI Enterprise software subscription suite, bundling tools such as TensorRT, RAPIDS, and Triton inference server, charging by node. NVIDIA's official product page lists the complete toolset included in AI Enterprise. This isn't just about selling hardware; it's about turning enterprise deployment and operations into a subscription model after developers are accustomed to using CUDA.

Comparing the paths of both parties, the divergence has become sufficiently clear.

NVIDIA has established a full-stack closed loop from chip to system to software to cloud services. From day one in this loop, developers can utilize optimized tools, at the cost of being locked into a single vendor's ecosystem. AMD is taking a more open alternative route: using the industry-standard x86 architecture, supporting both Windows and Linux operating systems, making ROCm a compatible open-source stack with mainstream frameworks, aiming to attract cost-sensitive customers or those already seeking to diversify supplier risk with a lower price point.

The Ryzen AI Halo product itself is the most straightforward hardware expression of this route. It lacks custom network cards, dedicated OS, or low-precision training acceleration units. It is a general-purpose PC, just fitting in a block capable of running 200B parameter models alongside a reasonably competent GPU. You can use it for large model inference or turn off the terminal to run Photoshop. The $2,949 price of the HP Z2 Mini G1a cited in Tom's Hardware's report is significantly lower than DGX Spark's starting price of $3,999, and if you switch to other OEM versions, the price difference may exceed $1,000.

However, the downside of this flexibility is compromise. The Register's test data has already shown that once you leave single-batch inference and enter scenarios requiring large amounts of parallel computation, the low-precision advantages of the Blackwell architecture and years of optimized software stack can quickly widen the gap. If you need a desktop box capable of running Stable Diffusion to generate images, NVIDIA's CUDA ecosystem has an entire suite of ready-to-use tools. AMD's RDNA 3.5 architecture does not support FP4 and FP8 low-precision formats, losing performance for workloads like image generation—this is determined by the RDNA architecture design and cannot be solved by driver updates.

The Fate of the Box is Not Inside the Box

Bringing the timeline back, AMD's actions over the past year form a fairly clear route.

On the hardware front, the Instinct MI300 and MI325X are in mass production, MI350 and MI450 are progressing according to the roadmap, and the Ryzen AI Max+ 395 has transitioned from a laptop chip to a desktop APU embedded in a development platform. On the system front, acquiring ZT Systems provided rack-level design capability, and then spinning off manufacturing retained R&D. On the customer front, securing two long-term contracts of 6GW level ties down the world's largest AI computing power consumers while also bringing OpenAI into the shareholder list. On the software front, ROCm is iterating at a roughly quarterly pace, catching up with mainstream framework support, though the porting of cutting-edge libraries and community accumulation will take more time.

Each step is interlinked. Acquiring ZT Systems aimed at having the capacity to design the ultra-large-scale AI clusters that OpenAI and Meta need, rather than merely selling GPUs to server manufacturers. The rapid iteration of ROCm is to ensure that clients with the 6GW contracts have a usable software stack upon deployment, rather than bare-metal delivery. The launch of Ryzen AI Halo aims to extend the same ROCm ecosystem to the desktop, allowing developers to use a $3,000 machine for local debugging before deploying models to the cloud-based MI450 cluster.

But this does not mean that AMD has caught up with NVIDIA. The two 6GW contracts are future deployment commitments, reflecting infrastructure planning scale in gigawatt computing capacity, not the number of chips already shipped. The specific specifications of MI450 have yet to be disclosed; the actual performance, yield, and stability after mass deployment remain unknowns. ROCm has reached "usable" status on mainstream frameworks, but achieving a condition where "the community can help you when issues arise" will require further accumulation of time. Additionally, NVIDIA's 17 years of accumulation cannot be digested through just a few quarters of rapid iterations.

NVIDIA's moat is not solely in software. The ConnectX-7 network card of DGX Spark hints at another dimension of competition: while AMD is competing for developers with price-performance and openness, NVIDIA is locking in teams that need to conduct distributed training and large inference pipelines with clustering extension capabilities. Buying one DGX Spark costs $3,999, but purchasing two together with network cables allows for distributed prototyping. In this scenario, ROCm's advantage in single-machine inference becomes negated.

When the divergence between the two companies in AI ultimately falls on this palm-sized box, it transforms into a concrete choice. You open AMD's box, getting a familiar PC environment, using almost identical commands to set up PyTorch, loading models, and beginning inference, with a smooth process until you need to use a library that only has a CUDA backend. You open NVIDIA's box and find a dedicated environment with optimized hardware, drivers, and container toolchains, with everything operating as expected after starting up, only costing an extra thousand dollars, along with a migration cost locked in when switching suppliers in the future.

AMD has not directly challenged NVIDIA's full-stack empire. It has chosen a more pragmatic path: while NVIDIA's pricing and supply chain delivery capabilities cannot meet all customer demands, AMD is providing a sufficient alternative option. The two 6GW contracts are the most compelling evidence of this strategy to date. Ryzen AI Halo is an extension of this strategy on the desktop side, not merely following the trend of creating small AI boxes but rather taking a step forward along the line of “using an open ecosystem and cost advantages to attract developers who do not want to be locked in.”

免责声明：本文章仅代表作者个人观点，不代表本平台的立场和观点。本文章仅供信息分享，不构成对任何人的任何投资建议。用户与作者之间的任何争议，与本平台无关。如网页中刊载的文章或图片涉及侵权，请提供相关的权利证明和身份证明发送邮件到support@aicoin.com，本平台相关工作人员将会进行核查。

AMD releases a small AI host, directly targeting NVIDIA DGX Spark.

The Same 128GB but Two Different Paths

ROCm's Progress Bar, No Longer Just "Working is Fine"

NVIDIA’s Response During the Same Period is a Combination of Tactics

The Fate of the Box is Not Inside the Box

Selected Articles by PANews

Table of Contents

Related Articles