Visual tests using several safe confidence intervals

Timothée Mathieu

Visual tests using several safe confidence intervals

Timothée Mathieu

TL;DR

This paper develops a principled visual framework for two-sample mean comparison by constructing confidence intervals from e-variables and testing overlap between the intervals. It provides both fixed-time and anytime (sequential) tests with nonparametric, finite-sample type I–III error guarantees under bounded-support assumptions, leveraging the betting/e-value framework and Ville-type martingale bounds. The key contributions include (i) a concrete construction of $C_n(\alpha;X,W)$ via $E_n$, (ii) fixed-time and sequential overlap tests with explicit weight calibration and error bounds, (iii) analysis of interval length and non-intersection probabilities, and (iv) practical demonstrations in simulated data and in comparing sequential learning algorithms. The results offer a safe, interpretable visual alternative for practitioners to assess whether two population means differ, with rigorous nonparametric guarantees and applicability to sequential data streams in ML contexts.

Abstract

We propose a new statistical hypothesis testing framework which decides visually, using confidence intervals, whether the means of two samples are equal or if one is larger than the other. With our method, the user can at the same time visualize the confidence region of the means and do a test to decide if the means of the two populations are significantly different or not by looking whether the two confidence intervals overlap. To design this test we use confidence intervals constructed using e-variables, which provide a measure of evidence in hypothesis testing. We propose both a sequential test and a non-sequential test based on the overlap of confidence intervals and for each of these tests we give finite-time error bounds on the probabilities of error. We also illustrate the practicality of our method by applying it to the comparison of sequential learning algorithms.

Visual tests using several safe confidence intervals

TL;DR

Abstract

Visual tests using several safe confidence intervals

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (11)

Theorems & Definitions (14)