律动BlockBeats|6月 25, 2026 07:11
Unslot Extreme Compression 753B Model GLM-5.2, Achieving Smooth Local Deployment and Operation on Mac
According to Beating monitoring, Unslot AI announced that it has compressed the volume of the 753B parameter large model GLM-5.2 of Zhipu AI by more than 80% through dynamic quantization technology, and released a GGUF format version that supports local deployment on Mac. Through dynamic 1-bit and 2-bit quantization, the originally high 1.51 TB model can be reduced to 217 GB (1-bit variant) to 239 GB (2-bit UD-IQ2-M variant), allowing ordinary developers and small and medium-sized enterprises to deploy and run locally offline with just a Mac Studio. The quantified version achieved a smooth speed of 21.6 tokens/s on Mac Studio M3 Ultra (256 GB unified memory) devices, while retaining 76% to 82% accuracy of the original model. In the comparative test released by Unslot AI, the 1-bit level GLM-5.2 GGUF, when fully running locally, generated prompts for writing a complete HTML5 game with independent pixel wind, sound effects, and particle systems (Flappy Bird replica "Sunset Flier") with quality comparable to Claude 4.8 Opus and GPT-5.5. As an open-source hybrid expert (MoE) model launched by Zhipu AI, GLM-5.2 has 753B total parameters and 1 million token contexts. Under the traditional deployment mode, running a super large model requires building a costly cloud based multi card computing power cluster. However, the release of dynamic quantization solutions has broken down hardware barriers and significantly reduced the threshold for individuals and small teams to independently deploy top-level open-source models. At present, the GLM-5.2 GGUF weight is available for download on the Hugging Face platform, and users can directly load and run it through llama.cpp or Unslot Studio. [Original link]
Share To
Timeline
HotFlash
APP
X
Telegram
CopyLink