Table of Contents
Fetching ...

The Kolmogorov-Smirnov Statistic Revisited

Elvis Han Cui, Yihao Li, Zhuang Liu

TL;DR

This work revisits the Kolmogorov-Smirnov statistic across one-sample and two-sample settings, addressing both one-sided and two-sided variants and emphasizing finite-sample behavior. It develops exact finite-sample probabilities for the supremum of the empirical process, characterizes discrete hitting times, and discusses the DKWM inequality for constructing confidence bands, all within a unified empirical-process framework. The paper also provides asymptotic results for both one-sided and two-sided KS statistics, including the Durbin-Babu-Rao expansion when the target distribution depends on unknown parameters, enabling valid inference via bootstrap or Monte Carlo methods. Collectively, the results yield improved methodologies for KS-based hypothesis testing and model validation under finite samples and parameter uncertainty, with concrete formulas and practical guidance for p-value computation.

Abstract

The Kolmogorov-Smirnov (KS) statistic is a classical nonparametric test widely used for comparing an empirical distribution function with a reference distribution or for comparing two empirical distributions. Despite its broad applicability in statistical hypothesis testing and model validation, certain aspects of the KS statistic remain under-explored among the young generation, particularly under finite sample conditions. This paper revisits the KS statistic in both one-sample and two-sample scenarios, considering one-sided and two-sided variants. We derive exact probabilities for the supremum of the empirical process and present a unified treatment of the KS statistic under diverse settings. Additionally, we explore the discrete nature of the hitting times of the normalized empirical process, providing practical insights into the computation of KS test p-values. The study also discusses the Dvoretzky-Kiefer-Wolfowitz-Massart (DKWM) inequality, highlighting its role in constructing confidence bands for distribution functions. Using empirical process theory, we establish the limit distribution of the KS statistic when the true distribution includes unknown parameters. Our findings extend existing results, offering improved methodologies for statistical analysis and hypothesis testing using the KS statistic, particularly in finite sample scenarios.

The Kolmogorov-Smirnov Statistic Revisited

TL;DR

This work revisits the Kolmogorov-Smirnov statistic across one-sample and two-sample settings, addressing both one-sided and two-sided variants and emphasizing finite-sample behavior. It develops exact finite-sample probabilities for the supremum of the empirical process, characterizes discrete hitting times, and discusses the DKWM inequality for constructing confidence bands, all within a unified empirical-process framework. The paper also provides asymptotic results for both one-sided and two-sided KS statistics, including the Durbin-Babu-Rao expansion when the target distribution depends on unknown parameters, enabling valid inference via bootstrap or Monte Carlo methods. Collectively, the results yield improved methodologies for KS-based hypothesis testing and model validation under finite samples and parameter uncertainty, with concrete formulas and practical guidance for p-value computation.

Abstract

The Kolmogorov-Smirnov (KS) statistic is a classical nonparametric test widely used for comparing an empirical distribution function with a reference distribution or for comparing two empirical distributions. Despite its broad applicability in statistical hypothesis testing and model validation, certain aspects of the KS statistic remain under-explored among the young generation, particularly under finite sample conditions. This paper revisits the KS statistic in both one-sample and two-sample scenarios, considering one-sided and two-sided variants. We derive exact probabilities for the supremum of the empirical process and present a unified treatment of the KS statistic under diverse settings. Additionally, we explore the discrete nature of the hitting times of the normalized empirical process, providing practical insights into the computation of KS test p-values. The study also discusses the Dvoretzky-Kiefer-Wolfowitz-Massart (DKWM) inequality, highlighting its role in constructing confidence bands for distribution functions. Using empirical process theory, we establish the limit distribution of the KS statistic when the true distribution includes unknown parameters. Our findings extend existing results, offering improved methodologies for statistical analysis and hypothesis testing using the KS statistic, particularly in finite sample scenarios.

Paper Structure

This paper contains 6 sections, 5 theorems, 42 equations, 1 figure.

Key Result

Theorem 2.1

Let $\lambda\in[0,\sqrt{n}]$ and $\epsilon=\frac{\lambda}{\sqrt{n}}$. Set to be the first time that $Z_n(t)$ hits $-\lambda$. We have If we further assume that $F(t)=t$, i.e., $X$ is uniform on $[0,1]$, then for all $j=0,1,\cdots,\lfloor n-\lambda\sqrt{n}\rfloor.$

Figures (1)

  • Figure 1: Normalized Empirical Process $Z_n$ with $\lambda=0.8$

Theorems & Definitions (8)

  • Theorem 2.1: Smirnov-Birnbaum-Tingey
  • proof
  • Theorem 3.1: Gnedenko-Rvateva-Feller
  • proof
  • Corollary 1: Feller feller1991introduction
  • Corollary 2
  • proof
  • Theorem 6.1: Durbin-Babu-Rao