Table of Contents
Fetching ...

QuadEnhancer: Leveraging Quadratic Transformations to Enhance Deep Neural Networks

Qian Chen, Linxin Yang, Akang Wang, Xiaodong Luo, Yin Zhang

TL;DR

QuadEnhancer introduces a light-weight quadratic augmentation to standard linear layers to enrich feature interactions. By factorizing and sparsifying the quadratic terms with a shared linear path, the method maintains low parameter and FLOP overhead while boosting expressiveness. Empirical results across image classification, text classification, and LLM fine-tuning show consistent performance gains over strong baselines, including large improvements on challenging datasets. This approach has potential for broad applicability in modern architectures, offering notable practical impact with minimal computational burden.

Abstract

The combination of linear transformations and non-linear activation functions forms the foundation of most modern deep neural networks, enabling them to approximate highly complex functions. This paper explores the introduction of quadratic transformations to further increase nonlinearity in neural networks, with the aim of enhancing the performance of existing architectures. To reduce parameter complexity and computational complexity, we propose a lightweight quadratic enhancer that uses low-rankness, weight sharing, and sparsification techniques. For a fixed architecture, the proposed approach introduces quadratic interactions between features at every layer, while only adding negligible amounts of additional model parameters and forward computations. We conduct a set of proof-of-concept experiments for the proposed method across three tasks: image classification, text classification, and fine-tuning large-language models. In all tasks, the proposed approach demonstrates clear and substantial performance gains.

QuadEnhancer: Leveraging Quadratic Transformations to Enhance Deep Neural Networks

TL;DR

QuadEnhancer introduces a light-weight quadratic augmentation to standard linear layers to enrich feature interactions. By factorizing and sparsifying the quadratic terms with a shared linear path, the method maintains low parameter and FLOP overhead while boosting expressiveness. Empirical results across image classification, text classification, and LLM fine-tuning show consistent performance gains over strong baselines, including large improvements on challenging datasets. This approach has potential for broad applicability in modern architectures, offering notable practical impact with minimal computational burden.

Abstract

The combination of linear transformations and non-linear activation functions forms the foundation of most modern deep neural networks, enabling them to approximate highly complex functions. This paper explores the introduction of quadratic transformations to further increase nonlinearity in neural networks, with the aim of enhancing the performance of existing architectures. To reduce parameter complexity and computational complexity, we propose a lightweight quadratic enhancer that uses low-rankness, weight sharing, and sparsification techniques. For a fixed architecture, the proposed approach introduces quadratic interactions between features at every layer, while only adding negligible amounts of additional model parameters and forward computations. We conduct a set of proof-of-concept experiments for the proposed method across three tasks: image classification, text classification, and fine-tuning large-language models. In all tasks, the proposed approach demonstrates clear and substantial performance gains.

Paper Structure

This paper contains 22 sections, 9 equations, 2 figures, 12 tables.

Figures (2)

  • Figure 1: Sparse structure of ${\bm{\Lambda}}$.
  • Figure 2: An overview of the quadratic enhancer.

Theorems & Definitions (1)

  • Example 3.1