Identifying Information from Observations with Uncertainty and Novelty

Derek S. Prijatelj; Timothy J. Ireland; Walter J. Scheirer

Identifying Information from Observations with Uncertainty and Novelty

Derek S. Prijatelj, Timothy J. Ireland, Walter J. Scheirer

TL;DR

The notion of identifiable information that arises from the language used to express the relationship between distinct states is formalized by formalizing the notion of identifiable information that arises from the language used to express the relationship between distinct states.

Abstract

A machine learning tasks from observations must encounter and process uncertainty and novelty, especially when it is to maintain performance when observing new information and to choose the hypothesis that best fits the current observations. In this context, some key questions arise: what and how much information did the observations provide, how much information is required to identify the data-generating process, how many observations remain to get that information, and how does a predictor determine that it has observed novel information? This paper strengthens existing answers to these questions by formalizing the notion of identifiable information that arises from the language used to express the relationship between distinct states. Model identifiability and sample complexity are defined via computation of an indicator function over a set of hypotheses, bridging algorithmic and probabilistic information. Their properties and asymptotic statistics are described for data-generating processes ranging from deterministic processes to ergodic stationary stochastic processes. This connects the notion of identifying information in finite steps with asymptotic statistics and PAC-learning. The indicator function's computation naturally formalizes novel information and its identification from observations with respect to a hypothesis set. We also proved that computable PAC-Bayes learners' sample complexity distribution is determined by its moments in terms of the prior probability distribution over a fixed finite hypothesis set.

Identifying Information from Observations with Uncertainty and Novelty

TL;DR

Abstract

Paper Structure (36 sections, 20 theorems, 29 equations, 3 figures, 1 table, 2 algorithms)

This paper contains 36 sections, 20 theorems, 29 equations, 3 figures, 1 table, 2 algorithms.

Introduction
Contributions
Background, Notation, and Foundation
Probability: Uncertainty in Variables, Functions, & Processes
Information Theory
Computability and Algorithmic Complexity
Statistics, Information, and Identification
Statistical and Computational Learning
What is Known, Unknown, and Novel?
On the Indicator Function and Determining Novelty
Identifying Information and the Sample Complexity
Definitions
Remark
Remark
Identifying Information from Direct Observations
...and 21 more sections

Key Result

Theorem 4.2.1

(Verify a String's Set Membership by Exhaustive Symbol Comparisons) Pairwise equality of a string $\vec{\theta\mkern-3mu}\mkern3mu$ to another $\vec{\psi\mkern-3mu}\mkern3mu \in \mathbf{\Theta}$, and thus verification that $\vec{\theta\mkern-3mu}\mkern3mu \in \mathbf{\Theta}$, can only occur once al

Figures (3)

Figure 1: Example probability vectors in their Barycentric coordinates of the probability simplex $\mathbb{P}^k$ of $k$ mutually exclusive symbols. The traditional empirical process's $\mathbb{P}^k$ of $k$ known symbols is a subspace of the empirical process's $\mathbb{P}^{k+1}$ with a symbol '$?$' to represent unknown observations.
Figure 2: A single time-step of a predictor updating its estimated hypothesis that best describes the observations witnessed up to this point in time.
Figure 3: The sample complexity typical set thresholds for a single hypothesis of a fair coin with $p=0.7$, $q=0.6$ over $P(\vec{X}^t_1 = \vec{x}^t_1)$. Figure \ref{['fig:typical_sets:neq']} depicts the typical and atypical sets at $t=1$ bounding the block entropy in probability space. Figure \ref{['fig:typical_sets:plot']} shows the bounds over 10 observations with log scaled probability. Observation sequences within the typical set support the hypothesis. $p$ and $q$ respectively determine the accepted probability of verifying a hypothesis and rejecting all hypotheses. If $p > q$ or $|\Theta| > 1$, then there exists an undertermined set of observations where more samples are required to determine set membership.

Theorems & Definitions (28)

Definition 2.1.1
Definition 4.1.1
Definition 4.1.2
Definition 4.1.3
Definition 4.1.4
Definition 4.1.5
Definition 4.1.6
Definition 4.1.7
Theorem 4.2.1
Theorem 4.2.2
...and 18 more

Identifying Information from Observations with Uncertainty and Novelty

TL;DR

Abstract

Identifying Information from Observations with Uncertainty and Novelty

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (3)

Theorems & Definitions (28)