Table of Contents
Fetching ...

Human Supervision as an Information Bottleneck: A Unified Theory of Error Floors in Human-Guided Learning

Alejandro Rodriguez Dominguez

TL;DR

A unified theory is developed showing that whenever the human supervision channel is not sufficient for a latent evaluation target, it acts as an information-reducing channel that induces a strictly positive excess-risk floor for any learner dominated by it.

Abstract

Large language models are trained primarily on human-generated data and feedback, yet they exhibit persistent errors arising from annotation noise, subjective preferences, and the limited expressive bandwidth of natural language. We argue that these limitations reflect structural properties of the supervision channel rather than model scale or optimization. We develop a unified theory showing that whenever the human supervision channel is not sufficient for a latent evaluation target, it acts as an information-reducing channel that induces a strictly positive excess-risk floor for any learner dominated by it. We formalize this Human-Bounded Intelligence limit and show that across six complementary frameworks (operator theory, PAC-Bayes, information theory, causal inference, category theory, and game-theoretic analyses of reinforcement learning from human feedback), non-sufficiency yields strictly positive lower bounds arising from the same structural decomposition into annotation noise, preference distortion, and semantic compression. The theory explains why scaling alone cannot eliminate persistent human-aligned errors and characterizes conditions under which auxiliary non-human signals (e.g., retrieval, program execution, tools) increase effective supervision capacity and collapse the floor by restoring information about the latent target. Experiments on real preference data, synthetic known-target tasks, and externally verifiable benchmarks confirm the predicted structural signatures: human-only supervision exhibits a persistent floor, while sufficiently informative auxiliary channels strictly reduce or eliminate excess error.

Human Supervision as an Information Bottleneck: A Unified Theory of Error Floors in Human-Guided Learning

TL;DR

A unified theory is developed showing that whenever the human supervision channel is not sufficient for a latent evaluation target, it acts as an information-reducing channel that induces a strictly positive excess-risk floor for any learner dominated by it.

Abstract

Large language models are trained primarily on human-generated data and feedback, yet they exhibit persistent errors arising from annotation noise, subjective preferences, and the limited expressive bandwidth of natural language. We argue that these limitations reflect structural properties of the supervision channel rather than model scale or optimization. We develop a unified theory showing that whenever the human supervision channel is not sufficient for a latent evaluation target, it acts as an information-reducing channel that induces a strictly positive excess-risk floor for any learner dominated by it. We formalize this Human-Bounded Intelligence limit and show that across six complementary frameworks (operator theory, PAC-Bayes, information theory, causal inference, category theory, and game-theoretic analyses of reinforcement learning from human feedback), non-sufficiency yields strictly positive lower bounds arising from the same structural decomposition into annotation noise, preference distortion, and semantic compression. The theory explains why scaling alone cannot eliminate persistent human-aligned errors and characterizes conditions under which auxiliary non-human signals (e.g., retrieval, program execution, tools) increase effective supervision capacity and collapse the floor by restoring information about the latent target. Experiments on real preference data, synthetic known-target tasks, and externally verifiable benchmarks confirm the predicted structural signatures: human-only supervision exhibits a persistent floor, while sufficiently informative auxiliary channels strictly reduce or eliminate excess error.
Paper Structure (23 sections, 8 theorems, 58 equations, 3 figures, 6 tables)

This paper contains 23 sections, 8 theorems, 58 equations, 3 figures, 6 tables.

Key Result

Theorem 1

Under Assumptions ass:human-only--ass:min-sep and the regularity conditions,

Figures (3)

  • Figure 1: Conceptual information flow under human-only (H), hybrid human+model (H+M), and hybrid with auxiliary channels (H+M+A). Auxiliary channels introduce additional information about $Y^\ast$, increasing effective supervision capacity and reducing or eliminating the structural excess-risk floor.
  • Figure 2: Real-data scaling behavior. Pairwise accuracy versus training size for human-only supervision ($\alpha=1$, blue) and hybrid supervision ($\alpha=0.5$, orange). Hybrid supervision matches or exceeds human-only performance across scales, while scaling alone does not eliminate the structural supervision gap
  • Figure 3: Synthetic distortion trajectory. Objective accuracy as a function of the human-weight parameter $\alpha$ in the known-target synthetic task. Distortion increases monotonically toward human-only supervision ($\alpha = 1$), confirming the predicted structural alignment gap.

Theorems & Definitions (16)

  • Theorem 1: Human-Bounded Intelligence (HBI)
  • proof
  • Theorem 2: Operator-Theoretic HBI
  • proof
  • Theorem 3: PAC-Bayes HBI
  • proof
  • Theorem 4: Information-Theoretic HBI
  • proof
  • Theorem 5: Causal HBI
  • proof
  • ...and 6 more