Table of Contents
Fetching ...

Weights Augmentation: it has never ever ever ever let her model down

Junbin Zhuang, Guiguang Din, Yunyi Yan

TL;DR

The paper addresses the reliance on a single weight in deep networks by introducing Weight Augmentation Strategy (WAS), which learns a distribution of weights by randomly transforming $PW$ into Shadow Weight $SW$ during training and basing updates on $SW$-driven loss $J(\theta^{sw})$ while updating $PW$. It introduces two inference modes: Accuracy-oriented mode (AOM) using $PW$ and Desire-oriented mode (DOM) using $SW$, enabling flexible, task-specific behavior without changing the data pipeline. The authors demonstrate across six CNN architectures on CIFAR-10/100 that WAS yields substantial accuracy gains (up to ~18–19 percentage points in some cases) and can reduce FLOPs by significant margins (up to ~36–47% depending on strategy). This dual-weight-space approach offers a practical route to more robust and efficient models, with broad applicability to convolutional architectures and potential for resource-constrained deployments.

Abstract

Weight play an essential role in deep learning network models. Unlike network structure design, this article proposes the concept of weight augmentation, focusing on weight exploration. The core of Weight Augmentation Strategy (WAS) is to adopt random transformed weight coefficients training and transformed coefficients, named Shadow Weight(SW), for networks that can be used to calculate loss function to affect parameter updates. However, stochastic gradient descent is applied to Plain Weight(PW), which is referred to as the original weight of the network before the random transformation. During training, numerous SW collectively form high-dimensional space, while PW is directly learned from the distribution of SW instead of the data. The weight of the accuracy-oriented mode(AOM) relies on PW, which guarantees the network is highly robust and accurate. The desire-oriented mode(DOM) weight uses SW, which is determined by the network model's unique functions based on WAT's performance desires, such as lower computational complexity, lower sensitivity to particular data, etc. The dual mode be switched at anytime if needed. WAT extends the augmentation technique from data augmentation to weight, and it is easy to understand and implement, but it can improve almost all networks amazingly. Our experimental results show that convolutional neural networks, such as VGG-16, ResNet-18, ResNet-34, GoogleNet, MobilementV2, and Efficientment-Lite, can benefit much at little or no cost. The accuracy of models is on the CIFAR100 and CIFAR10 datasets, which can be evaluated to increase by 7.32\% and 9.28\%, respectively, with the highest values being 13.42\% and 18.93\%, respectively. In addition, DOM can reduce floating point operations (FLOPs) by up to 36.33\%. The code is available at https://github.com/zlearh/Weight-Augmentation-Technology.

Weights Augmentation: it has never ever ever ever let her model down

TL;DR

The paper addresses the reliance on a single weight in deep networks by introducing Weight Augmentation Strategy (WAS), which learns a distribution of weights by randomly transforming into Shadow Weight during training and basing updates on -driven loss while updating . It introduces two inference modes: Accuracy-oriented mode (AOM) using and Desire-oriented mode (DOM) using , enabling flexible, task-specific behavior without changing the data pipeline. The authors demonstrate across six CNN architectures on CIFAR-10/100 that WAS yields substantial accuracy gains (up to ~18–19 percentage points in some cases) and can reduce FLOPs by significant margins (up to ~36–47% depending on strategy). This dual-weight-space approach offers a practical route to more robust and efficient models, with broad applicability to convolutional architectures and potential for resource-constrained deployments.

Abstract

Weight play an essential role in deep learning network models. Unlike network structure design, this article proposes the concept of weight augmentation, focusing on weight exploration. The core of Weight Augmentation Strategy (WAS) is to adopt random transformed weight coefficients training and transformed coefficients, named Shadow Weight(SW), for networks that can be used to calculate loss function to affect parameter updates. However, stochastic gradient descent is applied to Plain Weight(PW), which is referred to as the original weight of the network before the random transformation. During training, numerous SW collectively form high-dimensional space, while PW is directly learned from the distribution of SW instead of the data. The weight of the accuracy-oriented mode(AOM) relies on PW, which guarantees the network is highly robust and accurate. The desire-oriented mode(DOM) weight uses SW, which is determined by the network model's unique functions based on WAT's performance desires, such as lower computational complexity, lower sensitivity to particular data, etc. The dual mode be switched at anytime if needed. WAT extends the augmentation technique from data augmentation to weight, and it is easy to understand and implement, but it can improve almost all networks amazingly. Our experimental results show that convolutional neural networks, such as VGG-16, ResNet-18, ResNet-34, GoogleNet, MobilementV2, and Efficientment-Lite, can benefit much at little or no cost. The accuracy of models is on the CIFAR100 and CIFAR10 datasets, which can be evaluated to increase by 7.32\% and 9.28\%, respectively, with the highest values being 13.42\% and 18.93\%, respectively. In addition, DOM can reduce floating point operations (FLOPs) by up to 36.33\%. The code is available at https://github.com/zlearh/Weight-Augmentation-Technology.
Paper Structure (10 sections, 7 equations, 4 figures, 5 tables)

This paper contains 10 sections, 7 equations, 4 figures, 5 tables.

Figures (4)

  • Figure 1: We introduce an innovative model training strategy that does not learn the weights directly from the data but gains the distribution of weights through the data. Subsequently, weights for inference are learned from the distribution. Red circles denote the data points, blue indicates the weights developed during training, and yellow signifies the final weights obtained. (a) is a traditional training method where data is used to obtain specific parameters. In contrast, (b) illustrates our novel training method. Here, Arrows of different colors indicate that this weight has a better processing effect on the data (corresponding shape). Finally, the final weight is obtained by learning the weight distribution.
  • Figure 2: Sketch of WAS architecture. There are two modes of inference. Accuracy-oriented mode uses Plain Weight(PW) and desire-oriented mode uses Shadow Weight(SW). Here we only show a part of the network. As inspired by data augmentation, we also use WAS, but only for training.
  • Figure 3: WAS is used to generate variant weights. Here we show the 3x3 convolutional kernel of m layers. WAS enables precise tuning of kernel parameters, allowing adjustments to single or multiple kernels as required. The original weights are represented in red, the variant weights after WAS processing are in blue, and the yellow highlights indicate the weights that are finally determined.
  • Figure 4: WAS make "process triangle" where the interplay between its elements is evident: SW influences the loss function, which subsequently impacts PW, and PW, in turn, affects SW, creating a cyclical relationship.