Table of Contents
Fetching ...

On Experiments

Brendan van Rooyen

TL;DR

This paper presents a category-theoretic, information-theoretic framework for experiments, modeling them as morphisms (Markov transitions) and studying how information transforms through processing. It develops a general data-processing inequality, a bias-variance decomposition for canonical losses, and streamlined proofs of foundational theorems (Blackwell-Sherman-Stein, Randomization) while enabling quantitative deficiency via linear programming. The approach unifies decision theory, loss representations, and information measures (e.g., mutual information, phi-divergences) under a single algebraic lens, with concrete consequences for comparing experiments and designing informative studies. The results advance practical tools for evaluating and optimizing experiments, particularly through the linear-programming computation of deficiency and a convexified, canonical-loss perspective on losses and entropy.

Abstract

The scientific process is a means to turn the results of experiments into knowledge about the world in which we live. Much research effort has been directed toward automating this process. To do this, one needs to formulate the scientific process in a precise mathematical language. This paper outlines one such language. What is presented here is hardly new. The material is based on great thinkers from times past well as more modern contributions. The novel contributions of this paper are: A new general data processing inequality, a bias variance decomposition for canonical losses, streamlined proofs of the Blackwell-Sherman-Stein and Randomization theorems. means of calculating deficiency through linear programming.

On Experiments

TL;DR

This paper presents a category-theoretic, information-theoretic framework for experiments, modeling them as morphisms (Markov transitions) and studying how information transforms through processing. It develops a general data-processing inequality, a bias-variance decomposition for canonical losses, and streamlined proofs of foundational theorems (Blackwell-Sherman-Stein, Randomization) while enabling quantitative deficiency via linear programming. The approach unifies decision theory, loss representations, and information measures (e.g., mutual information, phi-divergences) under a single algebraic lens, with concrete consequences for comparing experiments and designing informative studies. The results advance practical tools for evaluating and optimizing experiments, particularly through the linear-programming computation of deficiency and a convexified, canonical-loss perspective on losses and entropy.

Abstract

The scientific process is a means to turn the results of experiments into knowledge about the world in which we live. Much research effort has been directed toward automating this process. To do this, one needs to formulate the scientific process in a precise mathematical language. This paper outlines one such language. What is presented here is hardly new. The material is based on great thinkers from times past well as more modern contributions. The novel contributions of this paper are: A new general data processing inequality, a bias variance decomposition for canonical losses, streamlined proofs of the Blackwell-Sherman-Stein and Randomization theorems. means of calculating deficiency through linear programming.

Paper Structure

This paper contains 28 sections, 19 theorems, 92 equations.

Key Result

Theorem 5

Let $f : \mathbb{P}^+(\Theta) \rightarrow \mathbb{R}$ be a concave 1-homogeneous function. Then for all $x$ and for all $v \in \partial f(x)$, Furthermore, $v \in \partial f (x) \implies v \in \partial f (\lambda x)$ for all $\lambda > 0$.

Theorems & Definitions (30)

  • Definition 1
  • Definition 2
  • Definition 3
  • Definition 4: Super-gradient of a concave function
  • Theorem 5: Generalized Euler's Homogeneous Function Theorem
  • Corollary 6
  • Theorem 7
  • Definition 8: Canonical Loss (Preliminary)
  • Lemma 9
  • Corollary 10: Loss from Entropy
  • ...and 20 more