Table of Contents
Fetching ...

Self Distillation via Iterative Constructive Perturbations

Maheak Dave, Aniket Kumar Singh, Aryan Pareek, Harshita Jha, Debasis Chaudhuri, Manish Pratap Singh

TL;DR

The paper tackles the persistent generalization gap in deep networks under heterogeneous inputs by introducing Iterative Constructive Perturbation (ICP) combined with self-distillation. ICP gradually refines inputs through gradient-based iterations and feeds the perturbed data back to the model to produce aligned intermediate features, which are enforced through a layer-wise distillation objective with a cosine-decay schedule. The approach is instantiated in variants like SGD-ICP, Adam-ICP, and AdEMAMix-ICP, and validated on CIFAR-100 with ResNet20 for classification and a VAE on CUB for image generation, showing improvements over baselines in accuracy, F1, SSIM, and FID. The framework demonstrates that proactive input refinement coupled with self-supervised feature alignment can effectively reduce overfitting and improve generalization, with potential applicability to larger models and more complex datasets.

Abstract

Deep Neural Networks have achieved remarkable achievements across various domains, however balancing performance and generalization still remains a challenge while training these networks. In this paper, we propose a novel framework that uses a cyclic optimization strategy to concurrently optimize the model and its input data for better training, rethinking the traditional training paradigm. Central to our approach is Iterative Constructive Perturbation (ICP), which leverages the model's loss to iteratively perturb the input, progressively constructing an enhanced representation over some refinement steps. This ICP input is then fed back into the model to produce improved intermediate features, which serve as a target in a self-distillation framework against the original features. By alternately altering the model's parameters to the data and the data to the model, our method effectively addresses the gap between fitting and generalization, leading to enhanced performance. Extensive experiments demonstrate that our approach not only mitigates common performance bottlenecks in neural networks but also demonstrates significant improvements across training variations.

Self Distillation via Iterative Constructive Perturbations

TL;DR

The paper tackles the persistent generalization gap in deep networks under heterogeneous inputs by introducing Iterative Constructive Perturbation (ICP) combined with self-distillation. ICP gradually refines inputs through gradient-based iterations and feeds the perturbed data back to the model to produce aligned intermediate features, which are enforced through a layer-wise distillation objective with a cosine-decay schedule. The approach is instantiated in variants like SGD-ICP, Adam-ICP, and AdEMAMix-ICP, and validated on CIFAR-100 with ResNet20 for classification and a VAE on CUB for image generation, showing improvements over baselines in accuracy, F1, SSIM, and FID. The framework demonstrates that proactive input refinement coupled with self-supervised feature alignment can effectively reduce overfitting and improve generalization, with potential applicability to larger models and more complex datasets.

Abstract

Deep Neural Networks have achieved remarkable achievements across various domains, however balancing performance and generalization still remains a challenge while training these networks. In this paper, we propose a novel framework that uses a cyclic optimization strategy to concurrently optimize the model and its input data for better training, rethinking the traditional training paradigm. Central to our approach is Iterative Constructive Perturbation (ICP), which leverages the model's loss to iteratively perturb the input, progressively constructing an enhanced representation over some refinement steps. This ICP input is then fed back into the model to produce improved intermediate features, which serve as a target in a self-distillation framework against the original features. By alternately altering the model's parameters to the data and the data to the model, our method effectively addresses the gap between fitting and generalization, leading to enhanced performance. Extensive experiments demonstrate that our approach not only mitigates common performance bottlenecks in neural networks but also demonstrates significant improvements across training variations.

Paper Structure

This paper contains 20 sections, 11 equations, 3 figures, 2 tables.

Figures (3)

  • Figure 1: Plot depicting effects of ICP and i-FGSM in a simple multi-class classification scenario
  • Figure 2: Overview of the proposed ICP based self-distillation framework
  • Figure 3: Left to right: Input image from CUB dataset, deterministic output of VAE (with no variance), outputs of VAE with 4 different noised latents with different seeds; Top to bottom: Baseline control method, SGD-ICP, Adam-ICP, and AdEMAMix-ICP.