Table of Contents
Fetching ...

QuadraNet V2: Efficient and Sustainable Training of High-Order Neural Networks with Quadratic Adaptation

Chenhui Xu, Xinyao Wang, Fuxun Yu, Jinjun Xiong, Xiang Chen

TL;DR

QuadraNet V2 introduces a quadratic-adaptation framework that attaches a low-rank, atrous quadratic adapter to pre-trained first-order networks, enabling high-order interactions without full re-training. By initializing the linear term from existing weights and setting the quadratic term to zero before adaptation, the method models nonlinear distribution shifts with significantly reduced training cost, achieving up to 98.6% GPU-hour savings in large-scale settings. Kernel methods and memory-efficient backpropagation further enhance capacity and practicality, while library-accelerated inference (QuadraLib) enables efficient deployment. The approach yields competitive or superior accuracy on ImageNet-1K/21K and MS COCO across model scales, highlighting a scalable path for sustainable growth of high-order neural networks.

Abstract

Machine learning is evolving towards high-order models that necessitate pre-training on extensive datasets, a process associated with significant overheads. Traditional models, despite having pre-trained weights, are becoming obsolete due to architectural differences that obstruct the effective transfer and initialization of these weights. To address these challenges, we introduce a novel framework, QuadraNet V2, which leverages quadratic neural networks to create efficient and sustainable high-order learning models. Our method initializes the primary term of the quadratic neuron using a standard neural network, while the quadratic term is employed to adaptively enhance the learning of data non-linearity or shifts. This integration of pre-trained primary terms with quadratic terms, which possess advanced modeling capabilities, significantly augments the information characterization capacity of the high-order network. By utilizing existing pre-trained weights, QuadraNet V2 reduces the required GPU hours for training by 90\% to 98.4\% compared to training from scratch, demonstrating both efficiency and effectiveness.

QuadraNet V2: Efficient and Sustainable Training of High-Order Neural Networks with Quadratic Adaptation

TL;DR

QuadraNet V2 introduces a quadratic-adaptation framework that attaches a low-rank, atrous quadratic adapter to pre-trained first-order networks, enabling high-order interactions without full re-training. By initializing the linear term from existing weights and setting the quadratic term to zero before adaptation, the method models nonlinear distribution shifts with significantly reduced training cost, achieving up to 98.6% GPU-hour savings in large-scale settings. Kernel methods and memory-efficient backpropagation further enhance capacity and practicality, while library-accelerated inference (QuadraLib) enables efficient deployment. The approach yields competitive or superior accuracy on ImageNet-1K/21K and MS COCO across model scales, highlighting a scalable path for sustainable growth of high-order neural networks.

Abstract

Machine learning is evolving towards high-order models that necessitate pre-training on extensive datasets, a process associated with significant overheads. Traditional models, despite having pre-trained weights, are becoming obsolete due to architectural differences that obstruct the effective transfer and initialization of these weights. To address these challenges, we introduce a novel framework, QuadraNet V2, which leverages quadratic neural networks to create efficient and sustainable high-order learning models. Our method initializes the primary term of the quadratic neuron using a standard neural network, while the quadratic term is employed to adaptively enhance the learning of data non-linearity or shifts. This integration of pre-trained primary terms with quadratic terms, which possess advanced modeling capabilities, significantly augments the information characterization capacity of the high-order network. By utilizing existing pre-trained weights, QuadraNet V2 reduces the required GPU hours for training by 90\% to 98.4\% compared to training from scratch, demonstrating both efficiency and effectiveness.
Paper Structure (22 sections, 7 equations, 7 figures, 5 tables)

This paper contains 22 sections, 7 equations, 7 figures, 5 tables.

Figures (7)

  • Figure 1: Linear v.s. Nonlinear Data Adaptation
  • Figure 2: Model performance on ImageNet-1K and GPU time required for different scales of pre-training.
  • Figure 3: High-order Adaptation Capacity.
  • Figure 4: High-order models have different architecture with traditional neural networks.
  • Figure 5: Stage Training of QDNNs.
  • ...and 2 more figures