Table of Contents
Fetching ...

Learning from Synthetic Data via Provenance-Based Input Gradient Guidance

Koshiro Nagano, Ryo Fujii, Ryo Hachiuma, Fumiaki Sato, Taiki Sekii, Hideo Saito

Abstract

Learning methods using synthetic data have attracted attention as an effective approach for increasing the diversity of training data while reducing collection costs, thereby improving the robustness of model discrimination. However, many existing methods improve robustness only indirectly through the diversification of training samples and do not explicitly teach the model which regions in the input space truly contribute to discrimination; consequently, the model may learn spurious correlations caused by synthesis biases and artifacts. Motivated by this limitation, this paper proposes a learning framework that uses provenance information obtained during the training data synthesis process, indicating whether each region in the input space originates from the target object, as an auxiliary supervisory signal to promote the acquisition of representations focused on target regions. Specifically, input gradients are decomposed based on information about target and non-target regions during synthesis, and input gradient guidance is introduced to suppress gradients over non-target regions. This suppresses the model's reliance on non-target regions and directly promotes the learning of discriminative representations for target regions. Experiments demonstrate the effectiveness and generality of the proposed method across multiple tasks and modalities, including weakly supervised object localization, spatio-temporal action localization, and image classification.

Learning from Synthetic Data via Provenance-Based Input Gradient Guidance

Abstract

Learning methods using synthetic data have attracted attention as an effective approach for increasing the diversity of training data while reducing collection costs, thereby improving the robustness of model discrimination. However, many existing methods improve robustness only indirectly through the diversification of training samples and do not explicitly teach the model which regions in the input space truly contribute to discrimination; consequently, the model may learn spurious correlations caused by synthesis biases and artifacts. Motivated by this limitation, this paper proposes a learning framework that uses provenance information obtained during the training data synthesis process, indicating whether each region in the input space originates from the target object, as an auxiliary supervisory signal to promote the acquisition of representations focused on target regions. Specifically, input gradients are decomposed based on information about target and non-target regions during synthesis, and input gradient guidance is introduced to suppress gradients over non-target regions. This suppresses the model's reliance on non-target regions and directly promotes the learning of discriminative representations for target regions. Experiments demonstrate the effectiveness and generality of the proposed method across multiple tasks and modalities, including weakly supervised object localization, spatio-temporal action localization, and image classification.

Paper Structure

This paper contains 45 sections, 10 equations, 8 figures, 10 tables.

Figures (8)

  • Figure 1: Overview of the proposed method. We adopt CutMix as the synthesis function $S(\cdot)$ and suppress input gradients using the provenance mask $M$, which is automatically obtained during input data synthesis, for auxiliary supervision. See the main text for details.
  • Figure 2: Examples of provenance information $\bm{I}$ obtained during synthesis. $\bm{I}$ corresponds to each supervisory label.
  • Figure 3: Visualization of Guided Grad-class activation maps for each method on CutMix-synthesized images.
  • Figure 4: Visualization of ground-truth BBoxes (red) and predictions of each method (green) on the CUB dataset.
  • Figure 5: Weakly supervised object localization accuracy as the coefficient $\alpha$ of the provenance loss in the total loss is varied on the CUB dataset.
  • ...and 3 more figures