Table of Contents
Fetching ...

Calibration of Ordinal Regression Networks

Daehwan Kim, Haejun Chung, Ikbeom Jang

TL;DR

This work addresses miscalibration and non-ordinal confidence in ordinal regression by introducing ORCU, a unified loss that combines soft ordinal encoding with an ordinal-aware regularization to enforce both calibration and unimodality. The objective, defined as $L_{ORCU} = L_{SCE} + L_{REG}$, promotes reliable confidence estimates while preserving the inherent label order. Across four public ordinal benchmarks with ResNet backbones, ORCU achieves state-of-the-art calibration (low SCE/ACE) without sacrificing accuracy, and qualitative analyses illustrate improved reliability and clearer ordinal structure. This approach advances trustworthy ordinal classification, enabling safer and more interpretable predictions in high-stakes domains such as medical diagnosis and rating systems.

Abstract

Recent studies have shown that deep neural networks are not well-calibrated and often produce over-confident predictions. The miscalibration issue primarily stems from using cross-entropy in classifications, which aims to align predicted softmax probabilities with one-hot labels. In ordinal regression tasks, this problem is compounded by an additional challenge: the expectation that softmax probabilities should exhibit unimodal distribution is not met with cross-entropy. The ordinal regression literature has focused on learning orders and overlooked calibration. To address both issues, we propose a novel loss function that introduces ordinal-aware calibration, ensuring that prediction confidence adheres to ordinal relationships between classes. It incorporates soft ordinal encoding and ordinal-aware regularization to enforce both calibration and unimodality. Extensive experiments across four popular ordinal regression benchmarks demonstrate that our approach achieves state-of-the-art calibration without compromising classification accuracy.

Calibration of Ordinal Regression Networks

TL;DR

This work addresses miscalibration and non-ordinal confidence in ordinal regression by introducing ORCU, a unified loss that combines soft ordinal encoding with an ordinal-aware regularization to enforce both calibration and unimodality. The objective, defined as , promotes reliable confidence estimates while preserving the inherent label order. Across four public ordinal benchmarks with ResNet backbones, ORCU achieves state-of-the-art calibration (low SCE/ACE) without sacrificing accuracy, and qualitative analyses illustrate improved reliability and clearer ordinal structure. This approach advances trustworthy ordinal classification, enabling safer and more interpretable predictions in high-stakes domains such as medical diagnosis and rating systems.

Abstract

Recent studies have shown that deep neural networks are not well-calibrated and often produce over-confident predictions. The miscalibration issue primarily stems from using cross-entropy in classifications, which aims to align predicted softmax probabilities with one-hot labels. In ordinal regression tasks, this problem is compounded by an additional challenge: the expectation that softmax probabilities should exhibit unimodal distribution is not met with cross-entropy. The ordinal regression literature has focused on learning orders and overlooked calibration. To address both issues, we propose a novel loss function that introduces ordinal-aware calibration, ensuring that prediction confidence adheres to ordinal relationships between classes. It incorporates soft ordinal encoding and ordinal-aware regularization to enforce both calibration and unimodality. Extensive experiments across four popular ordinal regression benchmarks demonstrate that our approach achieves state-of-the-art calibration without compromising classification accuracy.

Paper Structure

This paper contains 34 sections, 10 equations, 13 figures, 7 tables.

Figures (13)

  • Figure 1: Comparison of probability distributions for model predictions trained with loss functions targeting different objectives in ordinal classification. Top left: Model trained without any calibration or unimodality constraints. Top right: Ordinal-focused loss, producing unimodal but uncalibrated predictions. Bottom left: Calibration-focused loss, yielding calibrated but non-unimodal predictions. Bottom right: Proposed $\mathcal{L}_{\text{ORCU}}$, achieving both calibrated and unimodal predictions.
  • Figure 1: Effect of the parameter $t$ in $\mathcal{L}_{\text{ORCU}}$ on ECE. (a) Validation result on the balanced Diabetic Retinopathy dataset was used to determine $t$. (b) Validation on the imbalanced Image Aesthetics dataset, demonstrating the generalizability of $t$.
  • Figure 2: Illustration of ORCU's impact on calibration and unimodality in ordinal classification using the Adience dataset (8 ordinal age classes). The figure demonstrates how $\mathcal{L}_{\text{REG}}$ adjusts logits based on unimodality and uncertainty for (i) a low-uncertainty sample and (ii) a high-uncertainty sample, both for the case $k \geq y_n$. Each case includes: (a) computation of $r$, the logit difference quantifying uncertainty and unimodality; (b) updates without the ordinal-aware regularization, leading to inadequate calibration; and (c) updates with regularization, incorporating input-specific characteristics for improved calibration. See \ref{['sec_gradAnal']} and \ref{['gradients']} for details.
  • Figure 2: Reliability diagrams comparing ordinal loss functions on the test splits of Image Aesthetics, Adience, LIMUC, and Diabetic Retinopathy. The diagrams show the calibration gap between model confidence and accuracy. Bars above the expected line indicate underconfidence ($P(Y = y \mid \hat{P} = p) > p$), while those below indicate overconfidence ($P(Y = y \mid \hat{P} = p) < p$). ECE is computed using 15 bins.
  • Figure 3: Accuracy, QWK, and MAE vs. SCE on the LIMUC dataset. The plot illustrates the trade-off between prediction and calibration, with models near the top-left (lower SCE, higher Acc/QWK) or bottom-left (lower SCE, lower MAE) indicating optimal performance. ORCU (red triangle) achieves a strong balance between calibration and classification.
  • ...and 8 more figures