Table of Contents
Fetching ...

A Mathematics Framework of Artificial Shifted Population Risk and Its Further Understanding Related to Consistency Regularization

Xiliang Yang, Shenyang Deng, Shicong Liu, Yuanchi Suo, Wing. W. Y NG, Jianjun Zhang

TL;DR

This work develops a rigorous mathematical framework for understanding data augmentation through the lens of a shifted population distribution $p^*(x')$ and its associated risk. It proves that the shifted population risk decomposes into the original risk plus a GAP term, $R_f(p^*) = R_f(p) + GAP$, where $GAP$ acts as a consistency regularization and can negatively impact early training. To address this, the authors introduce a tunable coefficient $\lambda$ to down-weight the GAP during learning, aiming to preserve emphasis on major features and improve generalization. The framework unifies traditional size-increase and regularization interpretations of augmentation, provides theoretical insights into feature-weight updates under augmentation, and demonstrates improved generalization and convergence stability across CIFAR-10/100, Food-101, ImageNet, OOD PACS, and long-tail scenarios, with code available at the provided repository.

Abstract

Data augmentation is an important technique in training deep neural networks as it enhances their ability to generalize and remain robust. While data augmentation is commonly used to expand the sample size and act as a consistency regularization term, there is a lack of research on the relationship between them. To address this gap, this paper introduces a more comprehensive mathematical framework for data augmentation. Through this framework, we establish that the expected risk of the shifted population is the sum of the original population risk and a gap term, which can be interpreted as a consistency regularization term. The paper also provides a theoretical understanding of this gap, highlighting its negative effects on the early stages of training. We also propose a method to mitigate these effects. To validate our approach, we conducted experiments using same data augmentation techniques and computing resources under several scenarios, including standard training, out-of-distribution, and imbalanced classification. The results demonstrate that our methods surpass compared methods under all scenarios in terms of generalization ability and convergence stability. We provide our code implementation at the following link: https://github.com/ydlsfhll/ASPR.

A Mathematics Framework of Artificial Shifted Population Risk and Its Further Understanding Related to Consistency Regularization

TL;DR

This work develops a rigorous mathematical framework for understanding data augmentation through the lens of a shifted population distribution and its associated risk. It proves that the shifted population risk decomposes into the original risk plus a GAP term, , where acts as a consistency regularization and can negatively impact early training. To address this, the authors introduce a tunable coefficient to down-weight the GAP during learning, aiming to preserve emphasis on major features and improve generalization. The framework unifies traditional size-increase and regularization interpretations of augmentation, provides theoretical insights into feature-weight updates under augmentation, and demonstrates improved generalization and convergence stability across CIFAR-10/100, Food-101, ImageNet, OOD PACS, and long-tail scenarios, with code available at the provided repository.

Abstract

Data augmentation is an important technique in training deep neural networks as it enhances their ability to generalize and remain robust. While data augmentation is commonly used to expand the sample size and act as a consistency regularization term, there is a lack of research on the relationship between them. To address this gap, this paper introduces a more comprehensive mathematical framework for data augmentation. Through this framework, we establish that the expected risk of the shifted population is the sum of the original population risk and a gap term, which can be interpreted as a consistency regularization term. The paper also provides a theoretical understanding of this gap, highlighting its negative effects on the early stages of training. We also propose a method to mitigate these effects. To validate our approach, we conducted experiments using same data augmentation techniques and computing resources under several scenarios, including standard training, out-of-distribution, and imbalanced classification. The results demonstrate that our methods surpass compared methods under all scenarios in terms of generalization ability and convergence stability. We provide our code implementation at the following link: https://github.com/ydlsfhll/ASPR.

Paper Structure

This paper contains 6 sections, 2 equations.