Table of Contents
Fetching ...

On the Use of Anchoring for Training Vision Models

Vivek Narayanaswamy, Kowshik Thopalli, Rushil Anirudh, Yamen Mubarka, Wesam Sakla, Jayaraman J. Thiagarajan

TL;DR

This work systematically evaluates anchored training as a general protocol for vision models, uncovering a key limitation: simply increasing reference diversity does not automatically improve generalization. It addresses this by introducing Reference Masking Regularization, which randomly masks the reference with probability $\alpha$ and trains masked cases to predict a uniform distribution, promoting reliance on the joint reference-residual structure rather than shortcuts. Across CIFAR-10/100 and ImageNet using CNNs and vision transformers, the approach yields substantial gains in generalization, calibration, and anomaly rejection, particularly for high-capacity models and large reference sets, while remaining inference-efficient. The results suggest anchored training, when combined with the proposed regularizer, offers a robust, architecture-agnostic pathway to safer, more reliable vision systems and invites integration with broader model-sourcing and fine-tuning strategies.

Abstract

Anchoring is a recent, architecture-agnostic principle for training deep neural networks that has been shown to significantly improve uncertainty estimation, calibration, and extrapolation capabilities. In this paper, we systematically explore anchoring as a general protocol for training vision models, providing fundamental insights into its training and inference processes and their implications for generalization and safety. Despite its promise, we identify a critical problem in anchored training that can lead to an increased risk of learning undesirable shortcuts, thereby limiting its generalization capabilities. To address this, we introduce a new anchored training protocol that employs a simple regularizer to mitigate this issue and significantly enhances generalization. We empirically evaluate our proposed approach across datasets and architectures of varying scales and complexities, demonstrating substantial performance gains in generalization and safety metrics compared to the standard training protocol.

On the Use of Anchoring for Training Vision Models

TL;DR

This work systematically evaluates anchored training as a general protocol for vision models, uncovering a key limitation: simply increasing reference diversity does not automatically improve generalization. It addresses this by introducing Reference Masking Regularization, which randomly masks the reference with probability and trains masked cases to predict a uniform distribution, promoting reliance on the joint reference-residual structure rather than shortcuts. Across CIFAR-10/100 and ImageNet using CNNs and vision transformers, the approach yields substantial gains in generalization, calibration, and anomaly rejection, particularly for high-capacity models and large reference sets, while remaining inference-efficient. The results suggest anchored training, when combined with the proposed regularizer, offers a robust, architecture-agnostic pathway to safer, more reliable vision systems and invites integration with broader model-sourcing and fine-tuning strategies.

Abstract

Anchoring is a recent, architecture-agnostic principle for training deep neural networks that has been shown to significantly improve uncertainty estimation, calibration, and extrapolation capabilities. In this paper, we systematically explore anchoring as a general protocol for training vision models, providing fundamental insights into its training and inference processes and their implications for generalization and safety. Despite its promise, we identify a critical problem in anchored training that can lead to an increased risk of learning undesirable shortcuts, thereby limiting its generalization capabilities. To address this, we introduce a new anchored training protocol that employs a simple regularizer to mitigate this issue and significantly enhances generalization. We empirically evaluate our proposed approach across datasets and architectures of varying scales and complexities, demonstrating substantial performance gains in generalization and safety metrics compared to the standard training protocol.
Paper Structure (17 sections, 1 equation, 6 figures, 7 tables)

This paper contains 17 sections, 1 equation, 6 figures, 7 tables.

Figures (6)

  • Figure 1: Impact of reference set size on anchored training performance. With increase in reference set size, anchoring explores more diverse combinations of reference-residual pairs with the hope of demonstrating improved generalization performance. Surprisingly, the existing anchored training protocol does not effectively leverage this diversity even with increased reference set size albeit providing improvements in accuracy over standard training. We propose reference masking, a simple regularization strategy for training anchored models that recovers the lost performance.
  • Figure 2: Impact of the choice of inference protocol on the performance of anchored models thiagarajan2022singlenetanyahu2023learning. (Left) A single random reference is chosen for sample prediction; (Middle) Obtaining predictions using K random references followed by averaging; (Right) Bilinear Transduction that identifies the optimal reference for each sample. We find that, while these protocols have varying computational complexities (time (s)/1000 samples), there are no apparent gaps in the performance, indicating that the limitation of anchored training cannot be fixed through sophisticated inference protocols.
  • Figure 3: PyTorch style pseudo code for our proposed approach.
  • Figure 4: Impact of the proposed regularizer on anchored training. Using the CIFAR100C accuracy landscape, i.e., 2D heatmaps of the parameter space, we find that our approach identifies flatter and wider optima, thus leading to improved generalization garipov2018loss
  • Figure 5: Analysis of Anchored Models. Using evaluations on the CIFAR100C OOD generalization of ResNet18 models trained on CIFAR100, we study the behavior of the proposed approach when combined with data augmentation protocols (left) and in presence of training label noise (right).
  • ...and 1 more figures