Table of Contents
Fetching ...

Functional Properties of the Focal-Entropy

Jaimin Shah, Martina Cardone, Alex Dytso

TL;DR

This work rigorously shows that the focal-loss amplifies mid-range probabilities, suppresses high-probability outcomes, and, under extreme class imbalance, induces an over-suppression regime in which very small probabilities are further diminished.

Abstract

The focal-loss has become a widely used alternative to cross-entropy in class-imbalanced classification problems, particularly in computer vision. Despite its empirical success, a systematic information-theoretic study of the focal-loss remains incomplete. In this work, we adopt a distributional viewpoint and study the focal-entropy, a focal-loss analogue of the cross-entropy. Our analysis establishes conditions for finiteness, convexity, and continuity of the focal-entropy, and provides various asymptotic characterizations. We prove the existence and uniqueness of the focal-entropy minimizer, describe its structure, and show that it can depart significantly from the data distribution. In particular, we rigorously show that the focal-loss amplifies mid-range probabilities, suppresses high-probability outcomes, and, under extreme class imbalance, induces an over-suppression regime in which very small probabilities are further diminished. These results, which are also experimentally validated, offer a theoretical foundation for understanding the focal-loss and clarify the trade-offs that it introduces when applied to imbalanced learning tasks.

Functional Properties of the Focal-Entropy

TL;DR

This work rigorously shows that the focal-loss amplifies mid-range probabilities, suppresses high-probability outcomes, and, under extreme class imbalance, induces an over-suppression regime in which very small probabilities are further diminished.

Abstract

The focal-loss has become a widely used alternative to cross-entropy in class-imbalanced classification problems, particularly in computer vision. Despite its empirical success, a systematic information-theoretic study of the focal-loss remains incomplete. In this work, we adopt a distributional viewpoint and study the focal-entropy, a focal-loss analogue of the cross-entropy. Our analysis establishes conditions for finiteness, convexity, and continuity of the focal-entropy, and provides various asymptotic characterizations. We prove the existence and uniqueness of the focal-entropy minimizer, describe its structure, and show that it can depart significantly from the data distribution. In particular, we rigorously show that the focal-loss amplifies mid-range probabilities, suppresses high-probability outcomes, and, under extreme class imbalance, induces an over-suppression regime in which very small probabilities are further diminished. These results, which are also experimentally validated, offer a theoretical foundation for understanding the focal-loss and clarify the trade-offs that it introduces when applied to imbalanced learning tasks.
Paper Structure (44 sections, 26 theorems, 144 equations, 12 figures)

This paper contains 44 sections, 26 theorems, 144 equations, 12 figures.

Key Result

Proposition 1

For $p \in (0,1)$ and $\gamma \ge 0$, we have the following properties:

Figures (12)

  • Figure 1: Inverse of $\mathsf{L}^{\prime}_\gamma$ in \ref{['eq:firstDer']}.
  • Figure 2: Example of how a $P_X$ with $|\mathcal{S}| =3$ is transformed into $P^\star_\gamma$ in \ref{['eq:optimizer']} for different values of $\gamma$.
  • Figure 3: Initial probability mass function: $P_X = 0.4850.2430.2260.046$.
  • Figure 4: True Probability: $0.05$
  • Figure 5: Comparison of $\alpha^\star_\gamma$ in \ref{['eq:optimizer']} with its approximation in \ref{['eq: taylorExpansio of Alpha']}.
  • ...and 7 more figures

Theorems & Definitions (40)

  • Definition 1: Focal-loss
  • Definition 2: Focal-Entropy
  • Proposition 1
  • Proposition 2
  • Proposition 3
  • Proposition 4
  • Proposition 5
  • Proposition 6
  • Proposition 7
  • Remark 1
  • ...and 30 more