TabNSA: Native Sparse Attention for Efficient Tabular Data Learning
Ali Eslamian, Qiang Cheng
TL;DR
TabNSA tackles the challenge of learning from high-dimensional, heterogeneous tabular data by replacing dense attention with Native Sparse Attention (NSA), which employs token compression, feature-focused selection, and local sliding windows to reduce computation while preserving informative interactions. It couples NSA with TabMixer, a parallel, multi-branch MLP backbone, and augments this setup with Gemma for few-shot learning via tabular-to-text prompts, enabling language-guided generalization in low-resource regimes. Empirically, TabNSA achieves state-of-the-art or competitive performance on binary and multi-class classification and demonstrates strong transferability, while delivering substantial reductions in computational cost (FLOPs) compared to dense attention. The work also provides extensive ablations and efficiency analyses, highlighting a scalable, interpretable approach for tabular data, with future directions including prompt-enhanced LLM prompts and potential soft-attention extensions.
Abstract
Tabular data poses unique challenges for deep learning due to its heterogeneous feature types, lack of spatial structure, and often limited sample sizes. We propose TabNSA, a novel deep learning framework that integrates Native Sparse Attention (NSA) with a TabMixer backbone to efficiently model tabular data. TabNSA tackles computational and representational challenges by dynamically focusing on relevant feature subsets per instance. The NSA module employs a hierarchical sparse attention mechanism, including token compression, selective preservation, and localized sliding windows, to significantly reduce the quadratic complexity of standard attention operations while addressing feature heterogeneity. Complementing this, the TabMixer backbone captures complex, non-linear dependencies through parallel multilayer perceptron (MLP) branches with independent parameters. These modules are synergistically combined via element-wise summation and mean pooling, enabling TabNSA to model both global context and fine-grained interactions. Extensive experiments across supervised and transfer learning settings show that TabNSA consistently outperforms state-of-the-art deep learning models. Furthermore, by augmenting TabNSA with a fine-tuned large language model (LLM), we enable it to effectively address Few-Shot Learning challenges through language-guided generalization on diverse tabular benchmarks. Code available on: https://github.com/aseslamian/TabNSA
