律动BlockBeats
律动BlockBeats|6月 12, 2026 15:08
MiniMax M3 is officially open source, with native multimodal support for millions of contexts According to Beating monitoring, domestic model manufacturer MiniMax has officially opened sourced the weights of the native multimodal blending expert (MoE) model MiniMax M3 on Hugging Face. The total parameter count of MiniMax M3 is 428 billion, with a single token activating 23 billion parameters and native support for 1 million ultra long contexts. To reduce the overhead of deploying video memory, the development team has simultaneously released a quantified version of MXFP8, which is compatible with mainstream inference frameworks such as SGLang, vLLM, Transformers, etc. In multimodal design, MiniMax M3 conducts joint training of text, image, and video in the pre training stage to achieve native semantic fusion, rather than performing multimodal alignment in the post training stage. In terms of operating mechanism, the model provides dual inference modes, divided into Thinking mode for complex logic and tool orchestration, and Non thinking mode for low latency dialogue and code generation. The underlying kernel that supports millions of ultra long contexts is the open-source lightweight attention kernel library MiniMax Sparse Attention (MSA). According to official data, MSA adopts the Group Query Attention (GQA) block retrieval mechanism. In a very long context test of 1 million tokens, the MSA operator optimized for NVIDIA Blackwell (SM100) architecture can achieve more than 9 times pre fill acceleration and 15 times decoding acceleration compared to traditional full attention mechanism, while significantly reducing inference overhead. [Original link]
Mentioned
Share To

Timeline

HotFlash

APP

X

Telegram

Facebook

Reddit

CopyLink

Hot Reads