Table of Contents
Fetching ...

EUGens: Efficient, Unified, and General Dense Layers

Sang Min Kim, Byeongchan Kim, Arijit Sehanobish, Somnath Basu Roy Chowdhury, Rahul Kidambi, Dongseok Shim, Avinava Dubey, Snigdha Chaturvedi, Min-hwan Oh, Krzysztof Choromanski

TL;DR

This work targets the dense-computation bottleneck in neural networks by introducing Efficient, Unified, and General dense layers (EUGens), which use random features and input-norm coupling to approximate standard fully-connected layers with linear-time inference. The authors prove unbiased approximation for polynomial activations and provide concentration and continuity results, while also offering Quasi Monte Carlo variants to reduce variance. Empirically, replacing FFLs with EUGens in GPT-like transformers, Vision Transformers, and neural radiance fields yields substantial speedups (up to 27%) and memory reductions (up to 30%) across image, language, and 3D reconstruction tasks, with a capacity for layer-wise, backpropagation-free distillation. The practical impact is strong: EUGens enable scalable deployment of large models in real-time systems, while maintaining expressive power and enabling post-training compression and efficient adaptation of pre-trained models.

Abstract

Efficient neural networks are essential for scaling machine learning models to real-time applications and resource-constrained environments. Fully-connected feedforward layers (FFLs) introduce computation and parameter count bottlenecks within neural network architectures. To address this challenge, in this work, we propose a new class of dense layers that generalize standard fully-connected feedforward layers, \textbf{E}fficient, \textbf{U}nified and \textbf{Gen}eral dense layers (EUGens). EUGens leverage random features to approximate standard FFLs and go beyond them by incorporating a direct dependence on the input norms in their computations. The proposed layers unify existing efficient FFL extensions and improve efficiency by reducing inference complexity from quadratic to linear time. They also lead to \textbf{the first} unbiased algorithms approximating FFLs with arbitrary polynomial activation functions. Furthermore, EuGens reduce the parameter count and computational overhead while preserving the expressive power and adaptability of FFLs. We also present a layer-wise knowledge transfer technique that bypasses backpropagation, enabling efficient adaptation of EUGens to pre-trained models. Empirically, we observe that integrating EUGens into Transformers and MLPs yields substantial improvements in inference speed (up to \textbf{27}\%) and memory efficiency (up to \textbf{30}\%) across a range of tasks, including image classification, language model pre-training, and 3D scene reconstruction. Overall, our results highlight the potential of EUGens for the scalable deployment of large-scale neural networks in real-world scenarios.

EUGens: Efficient, Unified, and General Dense Layers

TL;DR

This work targets the dense-computation bottleneck in neural networks by introducing Efficient, Unified, and General dense layers (EUGens), which use random features and input-norm coupling to approximate standard fully-connected layers with linear-time inference. The authors prove unbiased approximation for polynomial activations and provide concentration and continuity results, while also offering Quasi Monte Carlo variants to reduce variance. Empirically, replacing FFLs with EUGens in GPT-like transformers, Vision Transformers, and neural radiance fields yields substantial speedups (up to 27%) and memory reductions (up to 30%) across image, language, and 3D reconstruction tasks, with a capacity for layer-wise, backpropagation-free distillation. The practical impact is strong: EUGens enable scalable deployment of large models in real-time systems, while maintaining expressive power and enabling post-training compression and efficient adaptation of pre-trained models.

Abstract

Efficient neural networks are essential for scaling machine learning models to real-time applications and resource-constrained environments. Fully-connected feedforward layers (FFLs) introduce computation and parameter count bottlenecks within neural network architectures. To address this challenge, in this work, we propose a new class of dense layers that generalize standard fully-connected feedforward layers, \textbf{E}fficient, \textbf{U}nified and \textbf{Gen}eral dense layers (EUGens). EUGens leverage random features to approximate standard FFLs and go beyond them by incorporating a direct dependence on the input norms in their computations. The proposed layers unify existing efficient FFL extensions and improve efficiency by reducing inference complexity from quadratic to linear time. They also lead to \textbf{the first} unbiased algorithms approximating FFLs with arbitrary polynomial activation functions. Furthermore, EuGens reduce the parameter count and computational overhead while preserving the expressive power and adaptability of FFLs. We also present a layer-wise knowledge transfer technique that bypasses backpropagation, enabling efficient adaptation of EUGens to pre-trained models. Empirically, we observe that integrating EUGens into Transformers and MLPs yields substantial improvements in inference speed (up to \textbf{27}\%) and memory efficiency (up to \textbf{30}\%) across a range of tasks, including image classification, language model pre-training, and 3D scene reconstruction. Overall, our results highlight the potential of EUGens for the scalable deployment of large-scale neural networks in real-world scenarios.

Paper Structure

This paper contains 46 sections, 8 theorems, 33 equations, 32 figures, 13 tables.

Key Result

Theorem 3.1

Take the $\mathrm{FFL}$ defined as follows for a weight matrix $\mathbf{W} \in \mathbb{R}^{l \times d}$ and an input vector $\mathbf{x} \in \mathbb{R}^{d}$: $\mathrm{FFL}(\mathbf{W},\mathbf{x})=f(\mathbf{W}\mathbf{x})$ (note that this is the most general form, since the bias term can always be absor Choose $\Psi, \Phi$ as identity functions. Take some zero-mean distributions: $\mathcal{D}^{i}_{j}

Figures (32)

  • Figure 1: Schematic diagram showcasing the workflow of a standard fully connected (left) and EUGen layer (middle). In EUGen, both the input, $\mathbf{x}$, and weight, $\mathbf{W}$, are transformed by non-linear mapping $f$ and $g$. These operations produce low-dimensional matrices whose multiplication reduces computational cost. Different dimensions of the new representation of the input $\mathbf{x}$ correspond to different monomials in the polynomial approximation of the activation function $f$. The representation can also directly depend on $\|\mathbf{x}\|_{2}$ (see: Sec. \ref{['sec:eugens']}). Such EUGen layers are introduced in Transformer blocks (right) to improve the efficiency of the overall network.
  • Figure 2: Approximation capability of our EUGen layer using (left) ReLU, (middle) Softplus and (right) GELU activation functions. EUGens provide superior approximation results compared to the baselines (SNNK and Low-Rank methods) with the same number of parameters.
  • Figure 3: Evaluation result of EUGen for language model pre-training using GPT-2 (86M parameters). (Left) We report the validation loss of GPT-2 with different numbers of EUGen layers during pre-training. (Right) Tradeoff plot between the number of inference parameters and validation loss. Overall, we observe that EUGens outperform LowRank variations while enabling significant speedups with minimal impact on validation loss.
  • Figure 4: Evaluation results of EUGen for image classification tasks: (left) ImageNet and (right) Places365 datasets using ViTbase (86M parameters). We observe that EUGens can match the performance of vanilla ViTs with a significantly smaller fraction of parameters than the Low-Rank baseline.
  • Figure 5: Quantitative results for NeRF experiments including NeRF, D-NeRF, Zip-NeRF, and Mip-NeRF 360 showing PSNR versus inference time. Our models achieve similar PSNR scores while achieving at least 24% improvement in speeds for implicit representation models (NeRF and Mip-NeRF) as well as speeding the efficient hybrid models D-NeRF and Zip-NeRF by at least 6%.
  • ...and 27 more figures

Theorems & Definitions (15)

  • Theorem 3.1: EUGens can unbiasedly approximate FFLs with polynomial activations
  • Theorem 3.2: Concentration results of EUGens: part I
  • Theorem 3.3: Concentration results of EUGens: part II
  • Theorem 3.4: EUGens approximating FFLs with general continuous activations
  • proof
  • Lemma A.1
  • proof
  • proof
  • proof
  • Lemma A.2: Azuma's Inequality
  • ...and 5 more