Table of Contents
Fetching ...

Robust Canonicalization through Bootstrapped Data Re-Alignment

Johann Schmidt, Sebastian Stober

TL;DR

This work tackles pose-induced bias in fine-grained visual classification by showing that canonicalizers trained on misaligned data are brittle. It introduces G-bootstrapping, a principled, iterative procedure that re-aligns training samples toward a unimodal canonical pose and proves variance contraction guarantees on compact groups, yielding exponential convergence. Empirically, the approach improves rotoscale robustness on FGVC benchmarks (e.g., EU-Moths and NABirds) and can match augmentation performance without requiring heavy test-time computation or highly constrained architectures. The method offers a practical route to robust geometric invariance in biodiversity monitoring and related domains, balancing flexibility and efficiency.

Abstract

Fine-grained visual classification (FGVC) tasks, such as insect and bird identification, demand sensitivity to subtle visual cues while remaining robust to spatial transformations. A key challenge is handling geometric biases and noise, such as different orientations and scales of objects. Existing remedies rely on heavy data augmentation, which demands powerful models, or on equivariant architectures, which constrain expressivity and add cost. Canonicalization offers an alternative by shielding such biases from the downstream model. In practice, such functions are often obtained using canonicalization priors, which assume aligned training data. Unfortunately, real-world datasets never fulfill this assumption, causing the obtained canonicalizer to be brittle. We propose a bootstrapping algorithm that iteratively re-aligns training samples by progressively reducing variance and recovering the alignment assumption. We establish convergence guarantees under mild conditions for arbitrary compact groups, and show on four FGVC benchmarks that our method consistently outperforms equivariant, and canonicalization baselines while performing on par with augmentation.

Robust Canonicalization through Bootstrapped Data Re-Alignment

TL;DR

This work tackles pose-induced bias in fine-grained visual classification by showing that canonicalizers trained on misaligned data are brittle. It introduces G-bootstrapping, a principled, iterative procedure that re-aligns training samples toward a unimodal canonical pose and proves variance contraction guarantees on compact groups, yielding exponential convergence. Empirically, the approach improves rotoscale robustness on FGVC benchmarks (e.g., EU-Moths and NABirds) and can match augmentation performance without requiring heavy test-time computation or highly constrained architectures. The method offers a practical route to robust geometric invariance in biodiversity monitoring and related domains, balancing flexibility and efficiency.

Abstract

Fine-grained visual classification (FGVC) tasks, such as insect and bird identification, demand sensitivity to subtle visual cues while remaining robust to spatial transformations. A key challenge is handling geometric biases and noise, such as different orientations and scales of objects. Existing remedies rely on heavy data augmentation, which demands powerful models, or on equivariant architectures, which constrain expressivity and add cost. Canonicalization offers an alternative by shielding such biases from the downstream model. In practice, such functions are often obtained using canonicalization priors, which assume aligned training data. Unfortunately, real-world datasets never fulfill this assumption, causing the obtained canonicalizer to be brittle. We propose a bootstrapping algorithm that iteratively re-aligns training samples by progressively reducing variance and recovering the alignment assumption. We establish convergence guarantees under mild conditions for arbitrary compact groups, and show on four FGVC benchmarks that our method consistently outperforms equivariant, and canonicalization baselines while performing on par with augmentation.

Paper Structure

This paper contains 18 sections, 4 theorems, 21 equations, 4 figures, 1 algorithm.

Key Result

Lemma B.1

Let $G$ be a compact Lie group with a bi-invariant Riemannian metric. Then the Fréchet variance of the mixture distribution in eq:bootstrapping evolves as where $\mathcal{C}$ is a curvature correction term.

Figures (4)

  • Figure 1: Our proposed $G$-bootstrapping aligns dataset $\mathcal{D}$ gradually (over $T$ time steps) during training by minimizing the variance over a specified compact group $G$. In the above example, $G$ is the group of rotations and $p(G \mid \mathcal{D})$ represents the distribution of angles in $\mathcal{D}$. The process runs jointly with the training of a canonicalizer, which aligns samples during inference --- shielding geometric noise from the downstream model.
  • Figure 2: During training and inference, the distance between the prior $q(G)$ and the posterior $p_{\phi}(G \mid \mathbf{x})$ is leveraged to compute $\hat{g}$, which is used to correct the orientation of $\mathbf{x}$. This is used to bootstrap the training data over time and gradually align it to the prior.
  • Figure 3: (left) Average top-1 test accuracy progression over 100 training epochs on NaBirds NABirds using a $C_4$- GCNN and a $SO(2)$-equivariant Steerable backbone. (right) Average top-1 accuracy per angle on the rotation-augmented test set using $C_4$GCNN (with • / without •) and $SO(2)$Steerable (with • / without •).
  • Figure 4: Average top-1 test accuracy over three rotoscale cosets (constant scale per subplot, $\{1,1.125, 1.25\} \times \operatorname{C}_{17}$) of EU-Moths EU (goal: maximize area). A Swin-Base Swin is used as classifier, a RSESF RSESF as a rotoscale equivariant backbone, a canonicalizer is trained using the canonicalization prior Mondal2023 with and with our bootstrapping.

Theorems & Definitions (9)

  • Definition 2.1: Orbit Canonicalizer
  • Definition 2.2: Canonicalized Classifier
  • Lemma B.1
  • Lemma B.2: Mean Drift Under Symmetry
  • Lemma B.3
  • Theorem B.4: Variance contraction and exponential convergence
  • proof
  • proof
  • proof