Table of Contents
Fetching ...

Approximately Unimodal Likelihood Models for Ordinal Regression

Ryoya Yamasaki

TL;DR

The paper tackles ordinal regression under the unimodality framework, noting that many CPDs $\Pr(Y=y|{\mathbf X}={\mathbf x})$ are unimodal but not everywhere, which can bias strictly unimodal models. It introduces approximately unimodal likelihood models (MAUL) that mix a unimodal CPD $P_{\rm ul}$ with an unconstrained CPD $P_{\rm sl}$ through a mixture rate $r$, and provides theoretical bounds on representation and unimodality preservation. Through Experiments I and II on real-world SB and CV datasets, it shows that MAUL can reduce CPD deviation and NLL, particularly in small-sample regimes where variance dominates, with an optimal intermediate $r$ often yielding the best trade-off. The work also contrasts MAUL with related approaches (OD, OT, UPRL) and demonstrates that combining MAUL with regularization can further improve conditional-probability estimation, offering a practical tool for robust ordinal regression with limited data.

Abstract

Ordinal regression (OR, also called ordinal classification) is classification of ordinal data, in which the underlying target variable is categorical and considered to have a natural ordinal relation for the underlying explanatory variable. A key to successful OR models is to find a data structure `natural ordinal relation' common to many ordinal data and reflect that structure into the design of those models. A recent OR study found that many real-world ordinal data show a tendency that the conditional probability distribution (CPD) of the target variable given a value of the explanatory variable will often be unimodal. Several previous studies thus developed unimodal likelihood models, in which a predicted CPD is guaranteed to become unimodal. However, it was also observed experimentally that many real-world ordinal data partly have values of the explanatory variable where the underlying CPD will be non-unimodal, and hence unimodal likelihood models may suffer from a bias for such a CPD. Therefore, motivated to mitigate such a bias, we propose approximately unimodal likelihood models, which can represent up to a unimodal CPD and a CPD that is close to be unimodal. We also verify experimentally that a proposed model can be effective for statistical modeling of ordinal data and OR tasks.

Approximately Unimodal Likelihood Models for Ordinal Regression

TL;DR

The paper tackles ordinal regression under the unimodality framework, noting that many CPDs are unimodal but not everywhere, which can bias strictly unimodal models. It introduces approximately unimodal likelihood models (MAUL) that mix a unimodal CPD with an unconstrained CPD through a mixture rate , and provides theoretical bounds on representation and unimodality preservation. Through Experiments I and II on real-world SB and CV datasets, it shows that MAUL can reduce CPD deviation and NLL, particularly in small-sample regimes where variance dominates, with an optimal intermediate often yielding the best trade-off. The work also contrasts MAUL with related approaches (OD, OT, UPRL) and demonstrates that combining MAUL with regularization can further improve conditional-probability estimation, offering a practical tool for robust ordinal regression with limited data.

Abstract

Ordinal regression (OR, also called ordinal classification) is classification of ordinal data, in which the underlying target variable is categorical and considered to have a natural ordinal relation for the underlying explanatory variable. A key to successful OR models is to find a data structure `natural ordinal relation' common to many ordinal data and reflect that structure into the design of those models. A recent OR study found that many real-world ordinal data show a tendency that the conditional probability distribution (CPD) of the target variable given a value of the explanatory variable will often be unimodal. Several previous studies thus developed unimodal likelihood models, in which a predicted CPD is guaranteed to become unimodal. However, it was also observed experimentally that many real-world ordinal data partly have values of the explanatory variable where the underlying CPD will be non-unimodal, and hence unimodal likelihood models may suffer from a bias for such a CPD. Therefore, motivated to mitigate such a bias, we propose approximately unimodal likelihood models, which can represent up to a unimodal CPD and a CPD that is close to be unimodal. We also verify experimentally that a proposed model can be effective for statistical modeling of ordinal data and OR tasks.

Paper Structure

This paper contains 20 sections, 4 theorems, 24 equations, 9 figures, 2 tables.

Key Result

Theorem 1

It holds that where $S^\emptyset\coloneq \{(p_k)_{k\in[K]}\in S\mid p_k\neq0\text{ for all }k\in[K]\}$ for a set $S\subseteq\Delta_{K-1}$.

Figures (9)

  • Figure 1: Illustration of the idea of this study: Under the supposition that many ordinal data are not only high-UR but also low-UD, we develop approximately unimodal likelihood models, which can represent low-UD data to mitigate a bias of unimodal likelihood models that can represent only unimodal data, and which have a lower representation ability than unconstrained likelihood models to decrease a variance that becomes relatively large within the prediction error for small-size training data.
  • Figure 2: Illustration of models: The upper row shows the output flow of a unimodal VSL model \ref{['eq:VSLLink']}, the left side of the lower row shows that of an unconstrained SL model \ref{['eq:SLFunc']}, and a proposed MAUL model \ref{['eq:MAUL']} mixes their outputs to represent an approximately unimodal CPD.
  • Figure 3: Instance of $D_{\mathrm{H}}({\bm{p}},\hat{\Delta}_{K-1})$ with $K=10$ and ${\bm{p}}=\,$(0, 0, .05, .1, .15, .3, .2, .15, .05, 0$)^\top$ (left), (0, 0, .05, .15, .1, .3, .2, .15, .05, 0$)^\top$ (center), (.15, 0, .2, 0, .1, .3, .05, 0, .05, .15$)^\top$ (right), where we show ${\bm{p}}$ by a black solid polyline and ${\bm{q}}=\mathop{\mathrm{arg\,min}}\nolimits_{{\bm{v}}\in\hat{\Delta}_{K-1}}\|{\bm{p}}-{\bm{v}}\|$ by a red dotted polyline.
  • Figure 4: Log-scaled histogram (ratio of $D_{{\mathrm{H}},i}$'s included in each of 100 bins) of 100-trial (for SB datasets) or 5-trial (for CV datasets) aggregation of estimates $D_{{\mathrm{H}},i}$ of $D_{\mathrm{H}}((\Pr(Y=y|{\bm{X}}={\bm{x}}_i))_{y\in[K]},\hat{\Delta}_{K-1})$ for real-world ordinal data, and 100-trial aggregation for uniform random data on $\Delta_{K-1}$.
  • Figure 5: Illustration of the POCL and POVSL models: $P_{\rm cl}(y;(\acute{b}_k-u)_{k\in[K-1]})$ and $P_{\rm sl}(y;\tau((\acute{b}_k-u)_{k\in[K]}))$ with $\tau(u)=u^2$ for $y=1,\ldots,K$ (red, green, blue, cyan, magenta) with $K=5$.
  • ...and 4 more figures

Theorems & Definitions (10)

  • Definition 1: Unimodal/high-UR data
  • Definition 2: Representation ability
  • Definition 3: Unimodal likelihood model
  • Theorem 1: Representation ability of the SL model; yamasaki2022unimodal
  • Theorem 2: Representation ability of the VSL model; yamasaki2022unimodal
  • Definition 4: Low-UD data
  • Definition 5: $\epsilon$-approximately unimodal likelihood model
  • Theorem 3: Representation ability of the MAUL model
  • proof
  • Corollary 1: Corollary of Theorem \ref{['def:MAUL']}