Table of Contents
Fetching ...

Prior2Posterior: Model Prior Correction for Long-Tailed Learning

S Divakar Bhat, Amit More, Mudit Soni, Surbhi Agrawal

TL;DR

The paper tackles long-tailed recognition by addressing the mismatch between training priors and balanced test distributions. It introduces Prior2Posterior (P2P), a post-hoc correction that estimates the model’s effective prior from its own predictions and adjusts posterior probabilities to align with the test prior, with theoretical optimality for plain CE and logit-adjusted losses. Empirically, P2P achieves state-of-the-art results on CIFAR-LT, ImageNet-LT, and iNaturalist18, and can boost existing methods without retraining while enabling post-hoc bias inspection. The approach is compatible with two-stage decoupled training and can also enhance pre-trained models, making it a practical and broadly applicable tool for mitigating residual bias in long-tailed learning.

Abstract

Learning-based solutions for long-tailed recognition face difficulties in generalizing on balanced test datasets. Due to imbalanced data prior, the learned \textit{a posteriori} distribution is biased toward the most frequent (head) classes, leading to an inferior performance on the least frequent (tail) classes. In general, the performance can be improved by removing such a bias by eliminating the effect of imbalanced prior modeled using the number of class samples (frequencies). We first observe that the \textit{effective prior} on the classes, learned by the model at the end of the training, can differ from the empirical prior obtained using class frequencies. Thus, we propose a novel approach to accurately model the effective prior of a trained model using \textit{a posteriori} probabilities. We propose to correct the imbalanced prior by adjusting the predicted \textit{a posteriori} probabilities (Prior2Posterior: P2P) using the calculated prior in a post-hoc manner after the training, and show that it can result in improved model performance. We present theoretical analysis showing the optimality of our approach for models trained with naive cross-entropy loss as well as logit adjusted loss. Our experiments show that the proposed approach achieves new state-of-the-art (SOTA) on several benchmark datasets from the long-tail literature in the category of logit adjustment methods. Further, the proposed approach can be used to inspect any existing method to capture the \textit{effective prior} and remove any residual bias to improve its performance, post-hoc, without model retraining. We also show that by using the proposed post-hoc approach, the performance of many existing methods can be improved further.

Prior2Posterior: Model Prior Correction for Long-Tailed Learning

TL;DR

The paper tackles long-tailed recognition by addressing the mismatch between training priors and balanced test distributions. It introduces Prior2Posterior (P2P), a post-hoc correction that estimates the model’s effective prior from its own predictions and adjusts posterior probabilities to align with the test prior, with theoretical optimality for plain CE and logit-adjusted losses. Empirically, P2P achieves state-of-the-art results on CIFAR-LT, ImageNet-LT, and iNaturalist18, and can boost existing methods without retraining while enabling post-hoc bias inspection. The approach is compatible with two-stage decoupled training and can also enhance pre-trained models, making it a practical and broadly applicable tool for mitigating residual bias in long-tailed learning.

Abstract

Learning-based solutions for long-tailed recognition face difficulties in generalizing on balanced test datasets. Due to imbalanced data prior, the learned \textit{a posteriori} distribution is biased toward the most frequent (head) classes, leading to an inferior performance on the least frequent (tail) classes. In general, the performance can be improved by removing such a bias by eliminating the effect of imbalanced prior modeled using the number of class samples (frequencies). We first observe that the \textit{effective prior} on the classes, learned by the model at the end of the training, can differ from the empirical prior obtained using class frequencies. Thus, we propose a novel approach to accurately model the effective prior of a trained model using \textit{a posteriori} probabilities. We propose to correct the imbalanced prior by adjusting the predicted \textit{a posteriori} probabilities (Prior2Posterior: P2P) using the calculated prior in a post-hoc manner after the training, and show that it can result in improved model performance. We present theoretical analysis showing the optimality of our approach for models trained with naive cross-entropy loss as well as logit adjusted loss. Our experiments show that the proposed approach achieves new state-of-the-art (SOTA) on several benchmark datasets from the long-tail literature in the category of logit adjustment methods. Further, the proposed approach can be used to inspect any existing method to capture the \textit{effective prior} and remove any residual bias to improve its performance, post-hoc, without model retraining. We also show that by using the proposed post-hoc approach, the performance of many existing methods can be improved further.

Paper Structure

This paper contains 25 sections, 23 equations, 7 figures, 8 tables.

Figures (7)

  • Figure 1: We present the results on a toy dataset with the imbalance factor of $100$. The trained classifier regions are shown in purple and yellow colors and the ideal, Bayes classifier, boundaries are shown as a dashed black line in each figure. (a) Classifier boundaries are naturally biased when using a naive cross-entropy (CE) loss. (b) Using class frequencies for post-hoc correction removes the classifier bias to some extent. (c) Using proposed post-hoc correction with learned prior, boundary is adjusted very close to the optimal Bayes' classifier.
  • Figure 2: We show marginal class probabilities, $P(y)$, for classes on CIFAR$100$-LT dataset with imbalance factor of $200$. The effective prior calculated using proposed approach and using class frequencies are shown for head classes (Many-first column), tail classes (Few-third column) and rest of the classes (Medium-middle column). One may note that model shows bias towards the head classes. Further, the bias is under represented by class frequencies for head classes and is over represented for other classes.
  • Figure 3:
  • Figure 4:
  • Figure 6: We show model biases for different cases. Bias estimated using class frequencies and proposed method are shown for ImageNet-LT and iNaturalist18 datasets for logit-adjusted feature tuning in Stage 2.
  • ...and 2 more figures