NdLinear: Preserving Multi-Dimensional Structure for Parameter-Efficient Neural Networks
Alex Reneau, Jerry Yao-Chieh Hu, Zhongfang Zhuang, Ting-Chun Liu, Xiang He, Judah Goldfeder, Nadav Timor, Allen G Roush, Ravid Shwartz-Ziv
TL;DR
NdLinear proposes a drop-in N-D tensor linear layer that preserves the native multi-dimensional structure by applying sequential, per-mode linear transforms, dramatically reducing parameters and compute compared to flattened layers. The authors prove expressivity preservation under a rank-1 Tucker (Kronecker) parameterization and demonstrate favorable VC-dimension scaling, alongside empirical evidence of strong performance across NLP, time series, tabular, and vision tasks with substantial efficiency gains. A key contribution is NdLinear-LoRA, which achieves up to 9× fewer trainable parameters while matching or exceeding LoRA on reasoning tasks, plus extensive ablations showing robustness to hyperparameters and minimal overhead. The work provides practical deployment guidance, showing NdLinear excels on axis-separable data but may underperform on highly entangled patterns, and it offers a paradigm shift away from flattening toward structure-preserving neural architectures with broad potential impact on on-device and federated learning.
Abstract
In deep learning, processing multidimensional inputs (e.g., images, medical scans, and time series) is an important task that often requires flattening the inputs. We introduce $\mathit{NdLinear}$, a drop-in replacement for linear layers that operates directly on tensors, requiring no flattening. By applying transformations separately along each dimension, NdLinear preserves native data structure while achieving dramatic parameter reductions, often by orders of magnitude, with minimal memory overhead. We prove NdLinear maintains expressivity through structured Tucker decomposition while preserving VC-dimension scaling. Extensive experiments demonstrate NdLinear's capacity to achieve significant parameter reductions with substantial wall-clock efficiency gains and minimal memory overhead. For instance, our $\mathit{NdLinear-LoRA}$ matches or exceeds standard LoRA on language reasoning tasks using up to $9\times$ fewer parameters. Experiments across CNNs, RNNs, Transformers, and MLPs on vision, language, time-series, and tabular tasks consistently demonstrate NdLinear's efficiency gains. While excelling at axis-separable tasks, NdLinear has limitations with entangled spatial interactions. By processing data in its original N-dimensional form, NdLinear provides a theoretically grounded, practical component for building more efficient neural architectures.
