Table of Contents
Fetching ...

Exponentially Consistent Statistical Classification of Continuous Sequences with Distribution Uncertainty

Lina Zhu, Lin Zhou

Abstract

In multiple classification, one aims to determine whether a testing sequence is generated from the same distribution as one of the M training sequences or not. Unlike most of existing studies that focus on discrete-valued sequences with perfect distribution match, we study multiple classification for continuous sequences with distribution uncertainty, where the generating distributions of the testing and training sequences deviate even under the true hypothesis. In particular, we propose distribution free tests and prove that the error probabilities of our tests decay exponentially fast for three different test designs: fixed-length, sequential, and two-phase tests. We first consider the simple case without the null hypothesis, where the testing sequence is known to be generated from a distribution close to the generating distribution of one of the training sequences. Subsequently, we generalize our results to a more general case with the null hypothesis by allowing the testing sequence to be generated from a distribution that is vastly different from the generating distributions of all training sequences.

Exponentially Consistent Statistical Classification of Continuous Sequences with Distribution Uncertainty

Abstract

In multiple classification, one aims to determine whether a testing sequence is generated from the same distribution as one of the M training sequences or not. Unlike most of existing studies that focus on discrete-valued sequences with perfect distribution match, we study multiple classification for continuous sequences with distribution uncertainty, where the generating distributions of the testing and training sequences deviate even under the true hypothesis. In particular, we propose distribution free tests and prove that the error probabilities of our tests decay exponentially fast for three different test designs: fixed-length, sequential, and two-phase tests. We first consider the simple case without the null hypothesis, where the testing sequence is known to be generated from a distribution close to the generating distribution of one of the training sequences. Subsequently, we generalize our results to a more general case with the null hypothesis by allowing the testing sequence to be generated from a distribution that is vastly different from the generating distributions of all training sequences.

Paper Structure

This paper contains 33 sections, 7 theorems, 56 equations, 5 figures.

Key Result

Theorem 1

Under any tuple of unknown distributions $\mathbf{P}=\{P_1,P_2,\ldots,P_M\}$, the fixed-length test in FLTest ensures that for each $i\in[M]$, the misclassification exponent satisfies

Figures (5)

  • Figure 1: Plot of simulated misclassification probabilities when $M=10$ and hypothesis $\mathrm{H}_1$ is true for our fixed-length test in Section \ref{['S-FLMT']}, our sequential test in Section \ref{['S-ST']}, and our two-phase test in Section \ref{['S-AFLMT']}. As observed, both our two-phase and sequential tests achieve better performance than the fixed-length test.
  • Figure 2: Plot of simulated average running times of our fixed-length, sequential, and two-phase tests in Section \ref{['Main']} as a function of the expected stopping time for the same setting as Fig. \ref{['detection_error_known']}. As observed, our fixed-length and two-phase tests have much smaller running times than the sequential test.
  • Figure 3: Plot of simulated misclassification probabilities of our fixed-length, sequential, and two-phase tests in Section \ref{['Main']} as a function of the average running time for the same setting as Fig. \ref{['detection_error_known']}.
  • Figure 4: Plot of simulated misclassification probabilities for our fixed-length test in Section \ref{['S-FLMT-un']}, sequential test in Section \ref{['S-ST-un']} and two-phase test in Section \ref{['S-AFLMT_un']}, under (a) hypothesis $\mathrm{H}_1$ is true and (b): null hypothesis. As observed, sequential and two-phase tests in Section \ref{['S-ST-un']}-\ref{['S-AFLMT_un']} outperform the fixed-length test in \ref{['FLMT_un']} as the expected stopping time $\mathsf{E}[\tau]$ tends to infinity.
  • Figure 5: Plot of simulated misclassification probabilities for our fixed-length and two-phase tests in Section \ref{['Main']} and \ref{['Main_un']} with $M=10$ training sequences when hypothesis $\mathrm{H}_1$ is true. As observed, there is a penalty in the performance of not knowing whether the null hypothesis is true.

Theorems & Definitions (7)

  • Theorem 1
  • Theorem 2
  • Theorem 3
  • Theorem 4
  • Theorem 5
  • Theorem 6
  • Lemma 1