Neural Prior Estimation: Learning Class Priors from Latent Representations

Masoud Yavari; Payman Moallem

Neural Prior Estimation: Learning Class Priors from Latent Representations

Masoud Yavari, Payman Moallem

TL;DR

This work tackles class-imbalance bias in deep recognition by learning explicit class priors directly from latent representations. It introduces the Neural Prior Estimator (NPE), comprising one or more Prior Estimation Modules (PEMs) trained with a one-way logistic loss to produce a feature-conditioned prior signal, which is then used in logit adjustment (NPE-LA) at inference via $\\tilde{\\mathbf{z}}(\\mathbf{x}) = \\mathbf{z}(\\mathbf{x}) - \\boldsymbol{\\eta}(\\mathbf{x})$. The authors provide theoretical justification under the Neural Collapse regime showing that PEMs estimate a monotone transformation of the empirical counts, i.e., the log-prior up to an additive constant, and demonstrate strong empirical gains on long-tailed CIFAR and segmentation benchmarks (STARE, ADE20K), with careful handling of scaling and BN-free design for stable dense predictions. The approach is lightweight, inference-efficient, and compatible with existing augmentation and representation-learning pipelines, offering a principled path to online, adaptive imbalance correction without requiring explicit priors or re-sampling. Overall, NPE-LA delivers robust, feature-aware bias mitigation with clear theoretical and practical benefits for both instance-level and dense prediction tasks.

Abstract

Class imbalance induces systematic bias in deep neural networks by imposing a skewed effective class prior. This work introduces the Neural Prior Estimator (NPE), a framework that learns feature-conditioned log-prior estimates from latent representations. NPE employs one or more Prior Estimation Modules trained jointly with the backbone via a one-way logistic loss. Under the Neural Collapse regime, NPE is analytically shown to recover the class log-prior up to an additive constant, providing a theoretically grounded adaptive signal without requiring explicit class counts or distribution-specific hyperparameters. The learned estimate is incorporated into logit adjustment, forming NPE-LA, a principled mechanism for bias-aware prediction. Experiments on long-tailed CIFAR and imbalanced semantic segmentation benchmarks (STARE, ADE20K) demonstrate consistent improvements, particularly for underrepresented classes. NPE thus offers a lightweight and theoretically justified approach to learned prior estimation and imbalance-aware prediction.

Neural Prior Estimation: Learning Class Priors from Latent Representations

TL;DR

. The authors provide theoretical justification under the Neural Collapse regime showing that PEMs estimate a monotone transformation of the empirical counts, i.e., the log-prior up to an additive constant, and demonstrate strong empirical gains on long-tailed CIFAR and segmentation benchmarks (STARE, ADE20K), with careful handling of scaling and BN-free design for stable dense predictions. The approach is lightweight, inference-efficient, and compatible with existing augmentation and representation-learning pipelines, offering a principled path to online, adaptive imbalance correction without requiring explicit priors or re-sampling. Overall, NPE-LA delivers robust, feature-aware bias mitigation with clear theoretical and practical benefits for both instance-level and dense prediction tasks.

Abstract

Paper Structure (28 sections, 1 theorem, 20 equations, 3 figures, 7 tables)

This paper contains 28 sections, 1 theorem, 20 equations, 3 figures, 7 tables.

Introduction
Methodology
Problem Setup
Neural Prior Estimator (NPE)
Prior Estimation Modules (PEMs).
Training objective.
Emergent frequency-dependent structure.
NPE estimate
Equivalence of estimating $\log N_c$ and $\log p_c$.
Number of PEMs.
Number of PEMs.
Choice of sign convention.
NPE for Imbalance-Aware Prediction
Inference-Time Efficiency.
Experiments
...and 13 more sections

Key Result

Proposition 1

The unique minimizer of $J_c$ is where $W(\cdot)$ denotes the principal branch of the Lambert $W$ function. In the saturation regime $N_c/\lambda\to\infty$, the asymptotic expansion is:

Figures (3)

Figure 1: Class-wise accuracy on CIFAR-100 ($\rho = 100$) under HP-2, illustrating the impact of different numbers of PEMs ($N_{\mathrm{PEM}}$) across Head, Medium, and Tail classes. NPE is utilized during training only; no logit adjustment is performed at inference time.
Figure 2: Class-wise accuracy on CIFAR-100 ($\rho=100$) under HP-2, comparing different methods across head, medium, and tail classes.
Figure 3: Class-wise IoU and accuracy on ADE20K for baseline DeepLab-V3 and NPE-LA (single FCN PEM, $\alpha=0.1$). Classes are ordered by ascending baseline performance for clarity.

Theorems & Definitions (2)

Proposition 1: Closed-Form Optimal Logit and Asymptotics
proof 1

Neural Prior Estimation: Learning Class Priors from Latent Representations

TL;DR

Abstract

Neural Prior Estimation: Learning Class Priors from Latent Representations

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (3)

Theorems & Definitions (2)