Sparse Dynamic Architecture

HW-Aligned Sparse Attention Architecture For Efficient Long-Context Modeling (DeepSeek et al.)

A new technical paper titled “Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention” was published by DeepSeek, Peking University and University of Washington.

Results that may be inaccessible to you are currently showing.

Hide inaccessible results

Feedback

HW-Aligned Sparse Attention Architecture For Efficient Long-Context Modeling (DeepSeek et al.)

Trending now