Table of Contents
Fetching ...

Unimodal Distributions for Ordinal Regression

Jaime S. Cardoso, Ricardo Cruz, Tomé Albuquerque

TL;DR

This work tackles ordinal regression by enforcing unimodal output distributions, addressing the limitations of cross-entropy for ordered labels. It provides a theoretical analysis of unimodal distributions on the probability simplex, proving connectedness properties and offering a practical Wasserstein projection framework for soft unimodality, alongside a hard unimodal architecture (UnimodalNet). The authors introduce two independent approaches—UnimodalNet and Wasserstein Regularization—demonstrating strong unimodality and competitive ordinal performance across ten datasets, with the hard method guaranteeing unimodality and the soft method offering a principled optimization path. The results highlight a favorable trade-off between unimodality and predictive accuracy, and the work contributes a solid theoretical foundation plus open-source tooling for reproducibility.

Abstract

In many real-world prediction tasks, class labels contain information about the relative order between labels that are not captured by commonly used loss functions such as multicategory cross-entropy. Recently, the preference for unimodal distributions in the output space has been incorporated into models and loss functions to account for such ordering information. However, current approaches rely on heuristics that lack a theoretical foundation. Here, we propose two new approaches to incorporate the preference for unimodal distributions into the predictive model. We analyse the set of unimodal distributions in the probability simplex and establish fundamental properties. We then propose a new architecture that imposes unimodal distributions and a new loss term that relies on the notion of projection in a set to promote unimodality. Experiments show the new architecture achieves top-2 performance, while the proposed new loss term is very competitive while maintaining high unimodality.

Unimodal Distributions for Ordinal Regression

TL;DR

This work tackles ordinal regression by enforcing unimodal output distributions, addressing the limitations of cross-entropy for ordered labels. It provides a theoretical analysis of unimodal distributions on the probability simplex, proving connectedness properties and offering a practical Wasserstein projection framework for soft unimodality, alongside a hard unimodal architecture (UnimodalNet). The authors introduce two independent approaches—UnimodalNet and Wasserstein Regularization—demonstrating strong unimodality and competitive ordinal performance across ten datasets, with the hard method guaranteeing unimodality and the soft method offering a principled optimization path. The results highlight a favorable trade-off between unimodality and predictive accuracy, and the work contributes a solid theoretical foundation plus open-source tooling for reproducibility.

Abstract

In many real-world prediction tasks, class labels contain information about the relative order between labels that are not captured by commonly used loss functions such as multicategory cross-entropy. Recently, the preference for unimodal distributions in the output space has been incorporated into models and loss functions to account for such ordering information. However, current approaches rely on heuristics that lack a theoretical foundation. Here, we propose two new approaches to incorporate the preference for unimodal distributions into the predictive model. We analyse the set of unimodal distributions in the probability simplex and establish fundamental properties. We then propose a new architecture that imposes unimodal distributions and a new loss term that relies on the notion of projection in a set to promote unimodality. Experiments show the new architecture achieves top-2 performance, while the proposed new loss term is very competitive while maintaining high unimodality.
Paper Structure (19 sections, 3 theorems, 19 equations, 7 figures, 5 tables)

This paper contains 19 sections, 3 theorems, 19 equations, 7 figures, 5 tables.

Key Result

Theorem 1

In the $(K - 1)$ dimensional probability simplex, the set of unimodal distributions with a fixed mode is a connected set.

Figures (7)

  • Figure 1: Example of possible output probability distributions. Even if both outputs agree on the majority class, only the unimodal distribution is consistent with an ordinal regression task.
  • Figure 2: Summary of the current unimodal approaches, where the axes represent the soft/hard constraint and parametric/nonparametric priorities. The proposed contributions are also mentioned in the right families.
  • Figure 3:
  • Figure 4: Exemplification of UnimodalNet.
  • Figure 5: Illustration for three classes of the proposed activation function, which forces the model output to be a unimodal distribution.
  • ...and 2 more figures

Theorems & Definitions (6)

  • Theorem 1
  • proof
  • Theorem 2
  • proof
  • Theorem 3
  • proof