Table of Contents
Fetching ...

Multiplicative Logit Adjustment Approximates Neural-Collapse-Aware Decision Boundary Adjustment

Naoya Hasegawa, Issei Sato

TL;DR

This work tackles long-tailed recognition by grounding a simple post-hoc method, Multiplicative Logit Adjustment (MLA), in neural-collapse (NC) theory. It develops an NC-based framework to derive near-optimal decision-boundary adjustments via class-wise feature spreads and shows MLA closely approximates the NC-driven 1-vs-1 boundary adjuster. The authors provide theoretical guarantees, discuss when the approximation holds, and validate the approach across CIFAR-LT, ImageNet-LT, and Helena, demonstrating MLA’s robustness even when NC is not fully realized and offering practical hyperparameter guidance. The result is a principled, scalable method that improves tail-class accuracy without retraining, with broad implications for post-hoc adjustments in long-tailed domains.

Abstract

Real-world data distributions are often highly skewed. This has spurred a growing body of research on long-tailed recognition, aimed at addressing the imbalance in training classification models. Among the methods studied, multiplicative logit adjustment (MLA) stands out as a simple and effective method. What theoretical foundation explains the effectiveness of this heuristic method? We provide a justification for the effectiveness of MLA with the following two-step process. First, we develop a theory that adjusts optimal decision boundaries by estimating feature spread on the basis of neural collapse. Second, we demonstrate that MLA approximates this optimal method. Additionally, through experiments on long-tailed datasets, we illustrate the practical usefulness of MLA under more realistic conditions. We also offer experimental insights to guide the tuning of MLA hyperparameters.

Multiplicative Logit Adjustment Approximates Neural-Collapse-Aware Decision Boundary Adjustment

TL;DR

This work tackles long-tailed recognition by grounding a simple post-hoc method, Multiplicative Logit Adjustment (MLA), in neural-collapse (NC) theory. It develops an NC-based framework to derive near-optimal decision-boundary adjustments via class-wise feature spreads and shows MLA closely approximates the NC-driven 1-vs-1 boundary adjuster. The authors provide theoretical guarantees, discuss when the approximation holds, and validate the approach across CIFAR-LT, ImageNet-LT, and Helena, demonstrating MLA’s robustness even when NC is not fully realized and offering practical hyperparameter guidance. The result is a principled, scalable method that improves tail-class accuracy without retraining, with broad implications for post-hoc adjustments in long-tailed domains.

Abstract

Real-world data distributions are often highly skewed. This has spurred a growing body of research on long-tailed recognition, aimed at addressing the imbalance in training classification models. Among the methods studied, multiplicative logit adjustment (MLA) stands out as a simple and effective method. What theoretical foundation explains the effectiveness of this heuristic method? We provide a justification for the effectiveness of MLA with the following two-step process. First, we develop a theory that adjusts optimal decision boundaries by estimating feature spread on the basis of neural collapse. Second, we demonstrate that MLA approximates this optimal method. Additionally, through experiments on long-tailed datasets, we illustrate the practical usefulness of MLA under more realistic conditions. We also offer experimental insights to guide the tuning of MLA hyperparameters.
Paper Structure (39 sections, 14 theorems, 67 equations, 7 figures, 13 tables, 1 algorithm)

This paper contains 39 sections, 14 theorems, 67 equations, 7 figures, 13 tables, 1 algorithm.

Key Result

Proposition 1

Suppose $n_k > 2$. For ${\bm{f}} \in \mathcal{F}$, assume that for all ${\bm{x}} \in \mathcal{S}_k$, ${\bm{f}}({\bm{x}}) = {\bm{\mu}}(\mathcal{S}_k)$ holds. Consider any $\theta$ that satisfies the following condition: where $\|\cdot\|_2$ denotes the spectral norm. For such $\theta$, the following holds:

Figures (7)

  • Figure 1: Overview of Propositions \ref{['prop:feature_prob']} and \ref{['prop:max_deg_2']}. The angular bound probability $\Pi(\theta; k)$ represents the lower bound of the probability that the feature vector of ${\bm{x}}$ sampled from $P_k$ lies within the shaded region. Proposition \ref{['prop:feature_prob']} indicates that $\Pi(\theta; k) = 1 - \tilde{\mathcal{O}}\left(1/\sqrt{n_k}\right)$. Proposition \ref{['prop:max_deg_2']} offers the optimal decision boundary by maximizing $\Pi(\theta_{k, k'}; k) + \Pi(\theta_{k', k}; k')$ with respect to $\theta_{k, k'}$ and $\theta_{k', k}$.
  • Figure 2: Heatmaps showing difference in angles of decision boundaries between each method and 1vs1adjuster. On the left is the result for CIFAR100-LT, and on the right is the result for CIFAR10-LT. The left side of each figure displays $\theta_{k, k'}^+ - \theta_{k, k'}^*$, while the right side displays $\theta_{k, k'}^{\times} - \theta_{k, k'}^*$. Values that became NaN are shown in gray. In the case of CIFAR100-LT, the angle differences between MLA and 1vs1adjuster are generally small.
  • Figure 3: Average accuracy of each model trained on each dataset and adjusted by different methods. The error bars represent the mean and standard deviation across five trials with different seed values. 1v1 is short for 1vs1adjuster. MLA and 1vs1adjuster consistently achieve comparable accuracy.
  • Figure 4: Heatmaps showing the difference in angles between the decision boundaries adjusted by each method and those adjusted by 1vs1adjuster of ResNeXt50 trained on CIFAR100-LT. The angle differences between MLA and 1vs1adjuster are generally small compared to the difference between ALA and 1vs1adjuster.
  • Figure 5: Heatmaps showing the difference in angles between the decision boundaries adjusted by each method and those adjusted by 1vs1adjuster of ResNeXt50 trained on ImageNet-LT. The angle differences between MLA and 1vs1adjuster are generally small compared to the difference between ALA and 1vs1adjuster.
  • ...and 2 more figures

Theorems & Definitions (21)

  • Definition 1: Angular bound probability
  • Proposition 1
  • Proposition 2
  • Proposition 3
  • Lemma 1
  • Lemma 2
  • Lemma 3
  • proof
  • Lemma 4
  • proof
  • ...and 11 more