QuadraNet V2: Efficient and Sustainable Training of High-Order Neural Networks with Quadratic Adaptation

Chenhui Xu; Xinyao Wang; Fuxun Yu; Jinjun Xiong; Xiang Chen

QuadraNet V2: Efficient and Sustainable Training of High-Order Neural Networks with Quadratic Adaptation

Chenhui Xu, Xinyao Wang, Fuxun Yu, Jinjun Xiong, Xiang Chen

TL;DR

QuadraNet V2 introduces a quadratic-adaptation framework that attaches a low-rank, atrous quadratic adapter to pre-trained first-order networks, enabling high-order interactions without full re-training. By initializing the linear term from existing weights and setting the quadratic term to zero before adaptation, the method models nonlinear distribution shifts with significantly reduced training cost, achieving up to 98.6% GPU-hour savings in large-scale settings. Kernel methods and memory-efficient backpropagation further enhance capacity and practicality, while library-accelerated inference (QuadraLib) enables efficient deployment. The approach yields competitive or superior accuracy on ImageNet-1K/21K and MS COCO across model scales, highlighting a scalable path for sustainable growth of high-order neural networks.

Abstract

Machine learning is evolving towards high-order models that necessitate pre-training on extensive datasets, a process associated with significant overheads. Traditional models, despite having pre-trained weights, are becoming obsolete due to architectural differences that obstruct the effective transfer and initialization of these weights. To address these challenges, we introduce a novel framework, QuadraNet V2, which leverages quadratic neural networks to create efficient and sustainable high-order learning models. Our method initializes the primary term of the quadratic neuron using a standard neural network, while the quadratic term is employed to adaptively enhance the learning of data non-linearity or shifts. This integration of pre-trained primary terms with quadratic terms, which possess advanced modeling capabilities, significantly augments the information characterization capacity of the high-order network. By utilizing existing pre-trained weights, QuadraNet V2 reduces the required GPU hours for training by 90\% to 98.4\% compared to training from scratch, demonstrating both efficiency and effectiveness.

QuadraNet V2: Efficient and Sustainable Training of High-Order Neural Networks with Quadratic Adaptation

TL;DR

Abstract

Paper Structure (22 sections, 7 equations, 7 figures, 5 tables)

This paper contains 22 sections, 7 equations, 7 figures, 5 tables.

Introduction
Theoretical Analysis
Pre-Training: Where Are We Today?
Difficulty of Modeling Non-linear Shift
Quadratic Net: Architecture-Agnostic High-Order Neural Interaction
Training QDNNs in Stages: Where We Are Going toward!
Design Methodology
QuadraNet V2 Overview
Model Initialization
Tuning Conventional Neural Networks with Quadratic Adapter
Inference with Library Optimized Quadratic Neural Networks
Optimization
Efficient Atrous Quadratic Connection
Memory-Efficient Back-Propagation
Experiments
...and 7 more sections

Figures (7)

Figure 1: Linear v.s. Nonlinear Data Adaptation
Figure 2: Model performance on ImageNet-1K and GPU time required for different scales of pre-training.
Figure 3: High-order Adaptation Capacity.
Figure 4: High-order models have different architecture with traditional neural networks.
Figure 5: Stage Training of QDNNs.
...and 2 more figures

QuadraNet V2: Efficient and Sustainable Training of High-Order Neural Networks with Quadratic Adaptation

TL;DR

Abstract

QuadraNet V2: Efficient and Sustainable Training of High-Order Neural Networks with Quadratic Adaptation

Authors

TL;DR

Abstract

Table of Contents

Figures (7)