Table of Contents
Fetching ...

Efficient Learning With Sine-Activated Low-rank Matrices

Yiping Ji, Hemanth Saratchandran, Cameron Gordon, Zeyu Zhang, Simon Lucey

TL;DR

The paper addresses the trade-off between parameter efficiency and accuracy in large neural networks by introducing a sine-activated low-rank decomposition. By applying a high-frequency sinusoidal nonlinearity to the low-rank factorization $\mathbf{W}=\mathbf{U}\mathbf{V}^{\top}$, the approach increases the effective rank without adding parameters, supported by a theoretical result that $\operatorname{Rank}(\sin(\omega \cdot (\mathbf{U}\mathbf{V}^{\top}))) > \operatorname{Rank}(\mathbf{U}\mathbf{V}^{\top})$ for large enough $\omega$. The method serves as a drop-in enhancement across ViTs, LLMs (via LoRA), NeRF, and 3D shape modeling, with empirical results showing improved accuracy and maintained efficiency even at low ranks. This work demonstrates broad applicability and practical impact for parameter-constrained deep learning applications, enabling higher performance within fixed resource budgets.

Abstract

Low-rank decomposition has emerged as a vital tool for enhancing parameter efficiency in neural network architectures, gaining traction across diverse applications in machine learning. These techniques significantly lower the number of parameters, striking a balance between compactness and performance. However, a common challenge has been the compromise between parameter efficiency and the accuracy of the model, where reduced parameters often lead to diminished accuracy compared to their full-rank counterparts. In this work, we propose a novel theoretical framework that integrates a sinusoidal function within the low-rank decomposition process. This approach not only preserves the benefits of the parameter efficiency characteristic of low-rank methods but also increases the decomposition's rank, thereby enhancing model performance. Our method proves to be a plug in enhancement for existing low-rank models, as evidenced by its successful application in Vision Transformers (ViT), Large Language Models (LLMs), Neural Radiance Fields (NeRF) and 3D shape modelling.

Efficient Learning With Sine-Activated Low-rank Matrices

TL;DR

The paper addresses the trade-off between parameter efficiency and accuracy in large neural networks by introducing a sine-activated low-rank decomposition. By applying a high-frequency sinusoidal nonlinearity to the low-rank factorization , the approach increases the effective rank without adding parameters, supported by a theoretical result that for large enough . The method serves as a drop-in enhancement across ViTs, LLMs (via LoRA), NeRF, and 3D shape modeling, with empirical results showing improved accuracy and maintained efficiency even at low ranks. This work demonstrates broad applicability and practical impact for parameter-constrained deep learning applications, enabling higher performance within fixed resource budgets.

Abstract

Low-rank decomposition has emerged as a vital tool for enhancing parameter efficiency in neural network architectures, gaining traction across diverse applications in machine learning. These techniques significantly lower the number of parameters, striking a balance between compactness and performance. However, a common challenge has been the compromise between parameter efficiency and the accuracy of the model, where reduced parameters often lead to diminished accuracy compared to their full-rank counterparts. In this work, we propose a novel theoretical framework that integrates a sinusoidal function within the low-rank decomposition process. This approach not only preserves the benefits of the parameter efficiency characteristic of low-rank methods but also increases the decomposition's rank, thereby enhancing model performance. Our method proves to be a plug in enhancement for existing low-rank models, as evidenced by its successful application in Vision Transformers (ViT), Large Language Models (LLMs), Neural Radiance Fields (NeRF) and 3D shape modelling.
Paper Structure (43 sections, 4 theorems, 28 equations, 9 figures, 13 tables)

This paper contains 43 sections, 4 theorems, 28 equations, 9 figures, 13 tables.

Key Result

Proposition 1

Fix an $m \times n$ matrix $\mathbf{A}$ s.t. $\mathbf{A} \neq 0$. Then

Figures (9)

  • Figure 1: Applying a drop-in sine-activation increases the rank of low-rank matrix methods, leading to improved parameter efficiency and performance on a variety of tasks including: a) NeRF, b) 3D Occupancy, c) ViT image classification, and d) Fine-tuning Large Language Models (LoRA).
  • Figure 2: These figures display weight magnitudes for matrices with dimension $128 \times 128$. The first figure shows a heatmap of a full-rank matrix initialized by Kaiming uniform, highlighting linear independence among rows. The second shows a low-rank matrix $\mathbf{W_\text{lr}} \,{=}\, \mathbf{UV}^T \,{\in}\, \mathbb{R}^{128 {\times }128}$, with $\mathbf{U},\mathbf{V} \,{\in}\, \mathbb{R}^{128 {\times} 1}$ illustrating minimal linear independence. The final pair of figures reveal how applying a sine function element-wise, $\sin(\omega \cdot \mathbf{UV}^T)$, with varying $\omega$, affects linear independence in low-rank matrices; specifically, $\omega = 100$ and $\omega = 2000$ progressively increase linear independence.
  • Figure 3: In this figure we depict the singular value spectrum of a Kaiming uniform initialized matrix $\mathbf{W_\text{fr}} \in \mathbb{R}^{256 \times 256}$ and a low-rank $k=5$ approximation matrix $\mathbf{W_\text{lr}} = \mathbf{U} \mathbf{V}^{T}$. All singular values are normalized to 1. Left: the spectral advantages of applying a non-linear function $\phi( \omega \cdot \mathbf{U} \mathbf{V}^{T})$ where $\omega$ is a hyper-parameter. Here we see the natural advantages of the sine function such that $\phi(\mathbf{x}) = \sin(\omega \cdot \mathbf{x})$. Right: manipulating $\omega$ within the sine function changes these spectral properties.
  • Figure 4: Low Rank ViT classification performance. Use of the sine-activation improves performance of the low-rank models, and even enables improvement relative to the Full Rank model.
  • Figure 5: (a) Using a non-transformed Low-Rank model leads to a complete loss of signal at extreme (rank $k=1$). In contrast, applying a sine-activation function is able to reconstruct details even at 1.3% of the Full-Rank parameters. (b) The Sine Low-Rank NeRF models show significant improvements across the rate-distortion curve relative to the Low-Rank models.
  • ...and 4 more figures

Theorems & Definitions (11)

  • Proposition 1
  • Theorem 1
  • Lemma 1
  • proof
  • Lemma 2
  • proof
  • Remark 1
  • proof : of proposition 3.1 from section 3.2 of main paper
  • proof : Proof of theorem 1 from section 3.2 of main paper
  • Remark 2
  • ...and 1 more