Remarks on Loss Function of Threshold Method for Ordinal Regression Problem
Ryoya Yamasaki, Toshiyuki Tanaka
TL;DR
This work investigates why threshold-based ordinal regression methods succeed or fail under varying data distributions and learning procedures. It analyzes all-threshold, immediate-threshold, and piecewise-linear losses, deriving conditions under which surrogate-risk minimization yields Bayes-optimal classifiers (notably under CL/ACL models) and identifying failure modes when data are highly heteroscedastic or non-unimodal. Through extensive simulations, synthesis data, and real-world age-estimation tasks, the authors show that non-PL losses often achieve stronger approximation performance on unimodal data, while PL and IT-based approaches can cause learned 1DT values to concentrate at a few points, degrading performance in larger-scale or multimodal settings. The findings highlight how the choice of loss, bias-structure, and optimization strategy shape the approximation error and thus practical performance, offering guidance for designing more robust threshold-based ordinal regression methods.
Abstract
Threshold methods are popular for ordinal regression problems, which are classification problems for data with a natural ordinal relation. They learn a one-dimensional transformation (1DT) of observations of the explanatory variable, and then assign label predictions to the observations by thresholding their 1DT values. In this paper, we study the influence of the underlying data distribution and of the learning procedure of the 1DT on the classification performance of the threshold method via theoretical considerations and numerical experiments. Consequently, for example, we found that threshold methods based on typical learning procedures may perform poorly when the probability distribution of the target variable conditioned on an observation of the explanatory variable tends to be non-unimodal. Another instance of our findings is that learned 1DT values are concentrated at a few points under the learning procedure based on a piecewise-linear loss function, which can make difficult to classify data well.
