Data Models With Two Manifestations of Imprecision

Christian Fröhlich; Robert C. Williamson

Data Models With Two Manifestations of Imprecision

Christian Fröhlich, Robert C. Williamson

TL;DR

The paper addresses the limitation of assuming i.i.d. data by introducing data models that allow data to be generated from a set of probability measures, thereby capturing two parallel forms of imprecision: aggregate (ir)regularity and local (ir)regularity. It develops non-stationary, locally precise (NSLP) and stationary locally imprecise (SLI) models, derives a main theorem linking cluster points of relative frequencies to the convex hull of the measure set $\mathcal{M}$, and situates these models within the imprecise-probability and generalized-LLN literature. It provides detailed comparisons to existing frameworks (notably Walley–Fine) and discusses estimation challenges, including negative results for aggregate irregularity and practical strategies for local irregularity via selection rules. The work lays a foundation for principled imprecise scoring rules and calibration tailored to these data models, with applications to dataset shift, multi-source learning, and fairness contexts where subpopulation heterogeneity matters.

Abstract

Motivated by recently emerging problems in machine learning and statistics, we propose data models which relax the familiar i.i.d. assumption. In essence, we seek to understand what it means for data to come from a set of probability measures. We show that our frequentist data models, parameterized by such sets, manifest two aspects of imprecision. We characterize the intricate interplay of these manifestations, aggregate (ir)regularity and local (ir)regularity, where a much richer set of behaviours compared to an i.i.d. model is possible. In doing so we shed new light on the relationship between non-stationary, locally precise and stationary, locally imprecise data models. We discuss possible applications of these data models in machine learning and how the set of probabilities can be estimated. For the estimation of aggregate irregularity, we provide a negative result but argue that it does not warrant pessimism. Understanding these frequentist aspects of imprecise probabilities paves the way for deriving generalization of proper scoring rules and calibration to the imprecise case, which can then contribute to tackling practical problems.

Data Models With Two Manifestations of Imprecision

TL;DR

, and situates these models within the imprecise-probability and generalized-LLN literature. It provides detailed comparisons to existing frameworks (notably Walley–Fine) and discusses estimation challenges, including negative results for aggregate irregularity and practical strategies for local irregularity via selection rules. The work lays a foundation for principled imprecise scoring rules and calibration tailored to these data models, with applications to dataset shift, multi-source learning, and fairness contexts where subpopulation heterogeneity matters.

Abstract

Paper Structure (19 sections, 36 theorems, 97 equations)

This paper contains 19 sections, 36 theorems, 97 equations.

Introduction
Data Models
Background on Imprecise Probabilities
Non-Stationary, Locally Precise Data Models
A Comparison to the Fierens-Fine Model
Stationary, Locally Imprecise Data Models
A Comparison to walley1982towards's walley1982towards and Fine et al.'s line of work
A Comparison to Generalized Laws of Large Numbers and the Subjectivist Perspective
Applications and Estimability
Applications
Estimation of Aggregate (Ir)regularity
Estimation of Local (Ir)regularity
Discussion
Acknowledgements
Appendix
...and 4 more sections

Key Result

Proposition 3.3

Let $\omega^\infty=(\omega_1,\omega_2,..)$ a data sequence and $\ell \in L^\infty$ any gamble. Then it holds:

Theorems & Definitions (73)

Definition 2.1: The i.i.d. model
Definition 2.2
Definition 2.3
Example 3.1
Example 3.2
Proposition 3.3: ivanenko2017expected,frohlich2024strictly
Theorem 3.4
Definition 3.5
Definition 3.6
Corollary 3.7
...and 63 more

Data Models With Two Manifestations of Imprecision

TL;DR

Abstract

Data Models With Two Manifestations of Imprecision

Authors

TL;DR

Abstract

Table of Contents

Key Result

Theorems & Definitions (73)