Table of Contents
Fetching ...

Learning the Optimal Stopping for Early Classification within Finite Horizons via Sequential Probability Ratio Test

Akinori F. Ebihara, Taiki Miyagawa, Kazuyuki Sakurai, Hitoshi Imaoka

TL;DR

This work tackles finite-horizon early classification of time series (ECTS) by reframing backward induction as two estimation tasks. It introduces FIRMBOUND, an SPRT-based framework that combines density-ratio estimation for LLRs with either convex-function learning (CFL) or Gaussian-process (GP) regression to estimate the continuation risk and a density-based approach for the sufficient statistic, ensuring Bayes-optimal decisions under a finite horizon. The authors prove statistical consistency and demonstrate that FIRMBOUND approaches Bayes optimality in AAPR and reveals Pareto-optimal speed-accuracy tradeoffs across synthetic and real datasets, while also reducing decision-time variance. The practical impact lies in enabling reliable, real-time early classifications in diverse domains (vision, action recognition, biosignals) with scalable training and robust performance, supported by public code.

Abstract

Time-sensitive machine learning benefits from Sequential Probability Ratio Test (SPRT), which provides an optimal stopping time for early classification of time series. However, in finite horizon scenarios, where input lengths are finite, determining the optimal stopping rule becomes computationally intensive due to the need for backward induction, limiting practical applicability. We thus introduce FIRMBOUND, an SPRT-based framework that efficiently estimates the solution to backward induction from training data, bridging the gap between optimal stopping theory and real-world deployment. It employs density ratio estimation and convex function learning to provide statistically consistent estimators for sufficient statistic and conditional expectation, both essential for solving backward induction; consequently, FIRMBOUND minimizes Bayes risk to reach optimality. Additionally, we present a faster alternative using Gaussian process regression, which significantly reduces training time while retaining low deployment overhead, albeit with potential compromise in statistical consistency. Experiments across independent and identically distributed (i.i.d.), non-i.i.d., binary, multiclass, synthetic, and real-world datasets show that FIRMBOUND achieves optimalities in the sense of Bayes risk and speed-accuracy tradeoff. Furthermore, it advances the tradeoff boundary toward optimality when possible and reduces decision-time variance, ensuring reliable decision-making. Code is publicly available at https://github.com/Akinori-F-Ebihara/FIRMBOUND

Learning the Optimal Stopping for Early Classification within Finite Horizons via Sequential Probability Ratio Test

TL;DR

This work tackles finite-horizon early classification of time series (ECTS) by reframing backward induction as two estimation tasks. It introduces FIRMBOUND, an SPRT-based framework that combines density-ratio estimation for LLRs with either convex-function learning (CFL) or Gaussian-process (GP) regression to estimate the continuation risk and a density-based approach for the sufficient statistic, ensuring Bayes-optimal decisions under a finite horizon. The authors prove statistical consistency and demonstrate that FIRMBOUND approaches Bayes optimality in AAPR and reveals Pareto-optimal speed-accuracy tradeoffs across synthetic and real datasets, while also reducing decision-time variance. The practical impact lies in enabling reliable, real-time early classifications in diverse domains (vision, action recognition, biosignals) with scalable training and robust performance, supported by public code.

Abstract

Time-sensitive machine learning benefits from Sequential Probability Ratio Test (SPRT), which provides an optimal stopping time for early classification of time series. However, in finite horizon scenarios, where input lengths are finite, determining the optimal stopping rule becomes computationally intensive due to the need for backward induction, limiting practical applicability. We thus introduce FIRMBOUND, an SPRT-based framework that efficiently estimates the solution to backward induction from training data, bridging the gap between optimal stopping theory and real-world deployment. It employs density ratio estimation and convex function learning to provide statistically consistent estimators for sufficient statistic and conditional expectation, both essential for solving backward induction; consequently, FIRMBOUND minimizes Bayes risk to reach optimality. Additionally, we present a faster alternative using Gaussian process regression, which significantly reduces training time while retaining low deployment overhead, albeit with potential compromise in statistical consistency. Experiments across independent and identically distributed (i.i.d.), non-i.i.d., binary, multiclass, synthetic, and real-world datasets show that FIRMBOUND achieves optimalities in the sense of Bayes risk and speed-accuracy tradeoff. Furthermore, it advances the tradeoff boundary toward optimality when possible and reduces decision-time variance, ensuring reliable decision-making. Code is publicly available at https://github.com/Akinori-F-Ebihara/FIRMBOUND

Paper Structure

This paper contains 121 sections, 11 theorems, 45 equations, 17 figures, 28 tables, 2 algorithms.

Key Result

Theorem 2.1

Let $\mathscr{S}_t$ be $(\pi_1(X^{(1,t)}, \ldots, \pi_K(X^{(1,t)}))$ w.l.o.g. SPRT $\delta^*$ is Bayes optimal if time-dependent thresholds $a_k^{(t)}$ in Eqs. eq:d* & eq:tau* are given by the intersections of the continuation risk function $\tilde{G}_t(\mathscr{S}_{t})$ and the stopping risk functi where $G^{\mathrm{min}}_t (\mathscr{S}_t)$ is referred to as the minimum risk function: Therefore,

Figures (17)

  • Figure 1: Visual guide to the optimal stopping under finite horizon. (a) Finite horizon SPRT. Prematurely set decision boundaries lead to suboptimal results. Starbursts mark the stopping times of three decision boundaries for class 1: (Right) a static boundary (upper gray line) leads to delayed decision making; (Center) an optimal decision boundary within a finite horizon (yellow curve) achieves a faster stopping time; and (Left) a lower static boundary (lower gray line) can achieve the same hitting time (center starburst) but increases the risk of classifying another sequence (blue trajectory) to a wrong class. (b) FIRMBOUND & Pareto front.FIRMBOUND's goal is to delineate the Pareto-optimal point (meaning "optimal in the speed-accuracy multi-objective optimization problem") on the speed-accuracy tradeoff (SAT) curve. It achieves the Pareto-optimal point within the existing front (blue star) or discovers a new Pareto-optimal point (red star) if possible.
  • Figure 2: Learning Decision Boundaries. (a) Estimation of the continuation risk function $\tilde{G}$. Convex Function Learning (CFL) and Gaussian Process (GP) regression on a two-class sequential Gaussian dataset are used. The decision boundary at the current time step ($=48$) is defined by the intersection of $\tilde{G}$ and the stopping risk function $G^{\mathrm{st}}$ (Thm. \ref{['thm:backward_induction']}) (b, c) Decision boundaries (thresholds) derived from a two-class (b) and three-class(c) sequential Gaussian dataset.
  • Figure 3: Conceptual figure of FIRMBOUND. (a) The intersections of $\tilde{G}$ and $G^{\mathrm{st}}$ delineates the decision boundary. (b) FIRMBOUND estimates conditional expectations using either GP or CFL, based on available sufficient statistic such as (estimated) posterior probabilities $\pi$ or LLRs $\lambda$.
  • Figure 4: Training and Testing. (Top) In the training phase, the sequential DRE algorithm SPRT-TANDEM is trained, followed by the training of CFL or GP models using the backward induction. (Bottom) In the testing phase, the trained DRE model is loaded to sequentially update the LLRs, with which the trained CFL/GP model calculates $\tilde{G}_t$ and compares it with $G^{\mathrm{st}}_t$ to make decisions at time $t$.
  • Figure 5: Averaged a posteriori risk (AAPR) curves. AAPRs of FIRMBOUND are compared with static-threshold SPRTs. Horizontal and vertical axes are mean hitting time and AAPR, respectively. Note that we only show models with well-calibrated sufficient statistic here, as ill-calibrated statistic does not necessarily correlate with ECTS performance by definition and thus not meaningful discussing its minima (but see App. \ref{['app:aapr_baselines']} for AAPR of other baseline models). Error bars represent the standard error of the mean.
  • ...and 12 more figures

Theorems & Definitions (15)

  • Definition 2.1: SPRT
  • Theorem 2.1: Backward induction equation
  • Theorem 3.1
  • Theorem 3.2: Informal
  • Theorem B.1
  • Theorem B.2: Asymptotic optimality of SPRT under a multiclass, non-i.i.d. case
  • Definition C.1: Sufficient Statistic
  • Theorem I.1: TANDEM formula
  • Definition I.1: MCE
  • Lemma J.1: CFL is statistically consistent (Prop. 1 in Siahkamari2022CFL)
  • ...and 5 more