DeepSeek releases paper on 'Native Sparse Attention: Hardware aligned and natively trainable Sparse Attention Mechanism'

PANews|Feb 18, 2025 13:22
The DeepSeek team recently released a technical paper titled "Native Sparse Attention: Hardware Alignment and Native Training Sparse Attention Mechanism", introducing their proposed NSA (Naturally Sparse Attention) mechanism. NSA combines algorithm innovation and hardware optimization to achieve efficient long text modeling. Its core innovations include:
1. Dynamic layered sparse strategy, combining coarse-grained token compression with fine-grained token selection to preserve global contextual information and local accuracy;
2. By designing algorithms that balance arithmetic strength and modern hardware optimization, significantly accelerate computation;
3. Support end-to-end training to reduce pre training computational costs while maintaining model performance.
The experimental results show that NSA performs well in areas such as long text tasks and instruction inference, especially in processing 64k length sequences, achieving significant acceleration in decoding, forward propagation, and backpropagation.
Share To
Timeline
HotFlash
APP
X
Telegram
CopyLink