Table of Contents
Fetching ...

An interpretable neural network-based non-proportional odds model for ordinal regression

Akifumi Okuno, Kazuharu Harada

TL;DR

This study establishes a sufficient condition under which the predicted conditional cumulative probability locally satisfies the monotonicity constraint over a user-specified region in the covariate space and provides a monotonicity-preserving stochastic algorithm for effectively training the neural network.

Abstract

This study proposes an interpretable neural network-based non-proportional odds model (N$^3$POM) for ordinal regression. N$^3$POM is different from conventional approaches to ordinal regression with non-proportional models in several ways: (1) N$^3$POM is defined for both continuous and discrete responses, whereas standard methods typically treat the ordered continuous variables as if they are discrete, (2) instead of estimating response-dependent finite-dimensional coefficients of linear models from discrete responses as is done in conventional approaches, we train a non-linear neural network to serve as a coefficient function. Thanks to the neural network, N$^3$POM offers flexibility while preserving the interpretability of conventional ordinal regression. We establish a sufficient condition under which the predicted conditional cumulative probability locally satisfies the monotonicity constraint over a user-specified region in the covariate space. Additionally, we provide a monotonicity-preserving stochastic (MPS) algorithm for effectively training the neural network. We apply N$^3$POM to several real-world datasets.

An interpretable neural network-based non-proportional odds model for ordinal regression

TL;DR

This study establishes a sufficient condition under which the predicted conditional cumulative probability locally satisfies the monotonicity constraint over a user-specified region in the covariate space and provides a monotonicity-preserving stochastic algorithm for effectively training the neural network.

Abstract

This study proposes an interpretable neural network-based non-proportional odds model (NPOM) for ordinal regression. NPOM is different from conventional approaches to ordinal regression with non-proportional models in several ways: (1) NPOM is defined for both continuous and discrete responses, whereas standard methods typically treat the ordered continuous variables as if they are discrete, (2) instead of estimating response-dependent finite-dimensional coefficients of linear models from discrete responses as is done in conventional approaches, we train a non-linear neural network to serve as a coefficient function. Thanks to the neural network, NPOM offers flexibility while preserving the interpretability of conventional ordinal regression. We establish a sufficient condition under which the predicted conditional cumulative probability locally satisfies the monotonicity constraint over a user-specified region in the covariate space. Additionally, we provide a monotonicity-preserving stochastic (MPS) algorithm for effectively training the neural network. We apply NPOM to several real-world datasets.
Paper Structure (44 sections, 2 theorems, 33 equations, 18 figures, 3 tables)

This paper contains 44 sections, 2 theorems, 33 equations, 18 figures, 3 tables.

Key Result

Proposition 1

Let $a:\mathcal{U} \to \mathbb{R}$ be a function defined in eq:au, equipped with the re-parameterization eq:re-parameterization. Let $\boldsymbol b:\mathcal{U} \to \mathbb{R}^d$ be a continuous function. If $f_u(\boldsymbol x):=a(u)+\langle \boldsymbol b(u),\boldsymbol x\rangle$ is non-decreasing wi

Figures (18)

  • Figure 1: Illustration of Proposition \ref{['prop:monotonicity']}. The estimated CCP $\hat{\mathbb{P}}_{\mathrm{N}^3\mathrm{POM}}(H \le u \mid X=\boldsymbol x)=\sigma(\hat{f}_u(\boldsymbol x))$ is non-decreasing with respect to $u \in \mathcal{U}$ (i.e., valid) if $\boldsymbol x \in \mathcal{X}_2(\eta)$, while monotonicity is not guaranteed (i.e., invalid) if $\boldsymbol x \notin \mathcal{X}_2(\eta)$.
  • Figure 2: autoMPG6 dataset experiment. The dashed line represents the coefficient function using the untrained (initial) NN output. Separate $10$ curves represent the coefficient functions with different random seeds for stochastic optimization. As these curves seem almost the same in all experiments, we can assert that the functions are estimated robustly against the stochastic optimization procedure.
  • Figure 3: Experimental results.
  • Figure 4: Scatter plots between covariates (house age, distance to station, and number of stores; $x$-axis) and response (house price of unit area; $y$-axis). Higher price, middle price, and lower price houses are separately colored.
  • Figure 6: autoMPG8 dataset experiment.
  • ...and 13 more figures

Theorems & Definitions (5)

  • Proposition 1
  • Proposition 2
  • Remark 1: Relation to the likelihood for interval-censored data
  • proof : Proof of Proposition \ref{['prop:constant_f']}
  • proof : Proof of Proposition \ref{['prop:monotonicity']}