A Unified Study on Sequentiality in Universal Classification with Empirically Observed Statistics

Ching-Fang Li; I-Hsiang Wang

A Unified Study on Sequentiality in Universal Classification with Empirically Observed Statistics

Ching-Fang Li, I-Hsiang Wang

TL;DR

This work develops a unified, universal framework for sequential binary classification with empirically observed statistics, casting all setups as instances of a general sequential composite hypothesis testing problem. It derives the optimal type-II error exponents under distribution-dependent type-I constraints for Fully-Sequential, Semi-Sequential-1, and Semi-Sequential-2 as well as Fixed-Length configurations, leveraging Rényi and generalized Jensen-Shannon divergences to express the exponents. The results show that sequentiality can eliminate trade-offs between error exponents under certain universality regimes, with explicit conditions under which training-data sequencing provides additional gains. Through a combination of converse and achievability proofs and a two-phase testing strategy, the paper quantifies when sequential approaches outperform fixed-length tests and offers insights into efficient testing and design choices for universal sequential classification with empirical statistics.

Abstract

In the binary hypothesis testing problem, it is well known that sequentiality in taking samples eradicates the trade-off between two error exponents, yet implementing the optimal test requires the knowledge of the underlying distributions, say $P_0$ and $P_1$. In the scenario where the knowledge of distributions is replaced by empirically observed statistics from the respective distributions, the gain of sequentiality is less understood when subject to universality constraints over all possible $P_0,P_1$. In this work, the gap is mended by a unified study on sequentiality in the universal binary classification problem, where the universality constraints are set on the expected stopping time as well as the type-I error exponent. The type-I error exponent is required to achieve a pre-set distribution-dependent constraint $λ(P_0,P_1)$ for all $P_0,P_1$. Under the proposed framework, different sequential setups are investigated so that fair comparisons can be made with the fixed-length counterpart. By viewing these sequential classification problems as special cases of a general sequential composite hypothesis testing problem, the optimal type-II error exponents are characterized. Specifically, in the general sequential composite hypothesis testing problem subject to universality constraints, upper and lower bounds on the type-II error exponent are proved, and a sufficient condition for which the bounds coincide is given. The results for sequential classification problems are then obtained accordingly. With the characterization of the optimal error exponents, the benefit of sequentiality is shown both analytically and numerically by comparing the sequential and the fixed-length cases in representative examples of type-I exponent constraint $λ$.

A Unified Study on Sequentiality in Universal Classification with Empirically Observed Statistics

TL;DR

Abstract

and

. In the scenario where the knowledge of distributions is replaced by empirically observed statistics from the respective distributions, the gain of sequentiality is less understood when subject to universality constraints over all possible

. In this work, the gap is mended by a unified study on sequentiality in the universal binary classification problem, where the universality constraints are set on the expected stopping time as well as the type-I error exponent. The type-I error exponent is required to achieve a pre-set distribution-dependent constraint

for all

. Under the proposed framework, different sequential setups are investigated so that fair comparisons can be made with the fixed-length counterpart. By viewing these sequential classification problems as special cases of a general sequential composite hypothesis testing problem, the optimal type-II error exponents are characterized. Specifically, in the general sequential composite hypothesis testing problem subject to universality constraints, upper and lower bounds on the type-II error exponent are proved, and a sufficient condition for which the bounds coincide is given. The results for sequential classification problems are then obtained accordingly. With the characterization of the optimal error exponents, the benefit of sequentiality is shown both analytically and numerically by comparing the sequential and the fixed-length cases in representative examples of type-I exponent constraint

Paper Structure (46 sections, 23 theorems, 145 equations, 4 figures)

This paper contains 46 sections, 23 theorems, 145 equations, 4 figures.

Introduction
Related works
Organization
Notation
Problem Formulation
Binary Classification
Composite Hypothesis Testing
Results for Composite Hypothesis Testing
Main Results on Universal Classification with Empirically Observed Statistics
Proof of \ref{['thm:complete_fully-seq', 'thm:complete_semi-seq']}
Proof of \ref{['thm:complete_semi-seq-2']}
Comparison
Constant Constraint
Efficient Tests
Proof of \ref{['prop:composite_converse']}
...and 31 more sections

Key Result

Proposition 1

Let $\lambda:\mathscr P_0 \to (0,\infty)$, and $\{\Phi_n\}$ be a sequence of tests such that Then for any $\bm P_1\in\mathscr P_1$, where $\Gamma_0 = \left\{\bm Q\in\mathscr P_1\,\middle\vert\, g_1(\bm Q) < 0\right\}$ and

Figures (4)

Figure 1: The optimal type-II error exponents under type-I error exponent constraint $\lambda(P_0,P_1)=\xi(\mathrm{D}_{\frac{\beta}{1+\beta}}\mkern-1.5mu\left( P_1 \middle\Vert P_0 \right)+0.003)$, where $\beta$ is the expected ratio of the number of the $P_1$-training samples to that of testing samples. Here $\mathcal{X} = \{0,1\}$, $\alpha=0.38$, $\beta=0.6$, and $P_0^*=[0.6, 0.4]$, $P_1^*=[0.1, 0.9]$. Let $\xi$ increase from $0.001$ to $1$ to obtain a curve. The optimal type-II error exponents in different setups can be identified by taking the minimum of the corresponding terms, as demonstrated in plot (b). This particular example is chosen judiciously so that all terms are active in the considered regime.
Figure 2: The setup of the generalized problem.
Figure 3: The optimal type-II error exponents under constant type-I error exponent constraint $\lambda(P_0,P_1)\equiv\lambda_0$. Fix $\mathcal{X} = \{0,1\}$, $\varepsilon = 0.01$, $\alpha=2$, and choose $P_0^*=[0.6, 0.4]$, $P_1^*=[0.1, 0.9]$. Let $\lambda_0$ increase from $0.001$ to $\mathrm{GJS}\mkern-1.5mu\left( P_0^*, P_1^*, \alpha \right)$ to obtain a curve. Note that in (a), $\kappa(P_0^*,P_1^*)$ is plot under two different values of $\beta$, whereas $e_{1,\mathsf{fix}}^*(P_0^*,P_1^*)$ and $\mathrm{D}_{\frac{\alpha}{1+\alpha}}\mkern-1.5mu\left( P_0^* \middle\Vert P_1^* \right)$ are independent of $\beta$.
Figure 4: The optimal type-II error exponents when $\lambda(P_0,P_1)=\xi\mathrm{D}_{\frac{\beta}{1+\beta}}\mkern-1.5mu\left( P_1 \middle\Vert P_0 \right)$. Fix $\mathcal{X} = \{0,1\}$, $\varepsilon = 0.01$, $\alpha=\beta=0.7$, and choose $P_0^*=[0.6, 0.4]$, $P_1^*=[0.1, 0.9]$. Let $\xi$ increase from $0.001$ to $0.999$ to obtain a curve.

Theorems & Definitions (29)

Remark 1
Proposition 1: Converse
Proposition 2: Achievability
Lemma 1
Proposition 3: Characterization of the optimal type-II error exponent
Corollary 1: Fully-sequential composite hypothesis testing
proof
Remark 2
Theorem 1: $\mathsf{Fully\text{-}Sequential}$
Theorem 2: $\mathsf{Semi\text{-}Sequential\text{-}1}$
...and 19 more

A Unified Study on Sequentiality in Universal Classification with Empirically Observed Statistics

TL;DR

Abstract

A Unified Study on Sequentiality in Universal Classification with Empirically Observed Statistics

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (4)

Theorems & Definitions (29)