Table of Contents
Fetching ...

Perception Learning: A Formal Separation of Sensory Representation Learning from Decision Learning

Suman Sanyal

TL;DR

End-to-end learning often entangles perception with decision tasks, hindering evaluation and transfer of perceptual representations. PeL proposes a perception-first paradigm that trains $f_phi: X \to Z$ using task-agnostic objectives to produce reusable codes, while downstream $g_theta: Z \to Y$ is learned separately; an orthogonality theorem shows that invariance-improving updates that preserve the invariant content do not impact Bayes risk under task-true invariances. It formalizes perceptual properties as functionals $Phi_P(f_phi; P_X, G)$ with target sets, and provides a suite of task-agnostic metrics (invariance, information preservation, geometry, etc.) to certify perceptual quality independent of task heads. The framework yields modular perception backbones for AGI stacks, with safety guarantees against overfitting to nuisance transformations and clearer diagnostics for perceptual health. Together, these contributions enable robust, transferable perception modules that decouple sensing from decision.

Abstract

We introduce Perception Learning (PeL), a paradigm that optimizes an agent's sensory interface $f_φ:\mathcal{X}\to\mathcal{Z}$ using task-agnostic signals, decoupled from downstream decision learning $g_θ:\mathcal{Z}\to\mathcal{Y}$. PeL directly targets label-free perceptual properties, such as stability to nuisances, informativeness without collapse, and controlled geometry, assessed via objective representation-invariant metrics. We formalize the separation of perception and decision, define perceptual properties independent of objectives or reparameterizations, and prove that PeL updates preserving sufficient invariants are orthogonal to Bayes task-risk gradients. Additionally, we provide a suite of task-agnostic evaluation metrics to certify perceptual quality.

Perception Learning: A Formal Separation of Sensory Representation Learning from Decision Learning

TL;DR

End-to-end learning often entangles perception with decision tasks, hindering evaluation and transfer of perceptual representations. PeL proposes a perception-first paradigm that trains using task-agnostic objectives to produce reusable codes, while downstream is learned separately; an orthogonality theorem shows that invariance-improving updates that preserve the invariant content do not impact Bayes risk under task-true invariances. It formalizes perceptual properties as functionals with target sets, and provides a suite of task-agnostic metrics (invariance, information preservation, geometry, etc.) to certify perceptual quality independent of task heads. The framework yields modular perception backbones for AGI stacks, with safety guarantees against overfitting to nuisance transformations and clearer diagnostics for perceptual health. Together, these contributions enable robust, transferable perception modules that decouple sensing from decision.

Abstract

We introduce Perception Learning (PeL), a paradigm that optimizes an agent's sensory interface using task-agnostic signals, decoupled from downstream decision learning . PeL directly targets label-free perceptual properties, such as stability to nuisances, informativeness without collapse, and controlled geometry, assessed via objective representation-invariant metrics. We formalize the separation of perception and decision, define perceptual properties independent of objectives or reparameterizations, and prove that PeL updates preserving sufficient invariants are orthogonal to Bayes task-risk gradients. Additionally, we provide a suite of task-agnostic evaluation metrics to certify perceptual quality.

Paper Structure

This paper contains 11 sections, 2 theorems, 13 equations, 2 figures, 3 tables.

Key Result

theorem 1

Assume (A1)--(A5) and let $\phi_0$ satisfy $f_{\phi_0}=h_0\circ T$ with $h_0$ injective on $\mathrm{range}(T)$ (so $\sigma(f_{\phi_0}(X))=\sigma(T(X))$). Let $v$ be any tangent direction at $\phi_0$ such that, for all sufficiently small $t$, $f_{\phi_0+t v}=h_t\circ T$ for some measurable $h_t$ that In particular, if $\nabla_\phi L_{\mathrm{inv}}(\phi_0)$ is such a direction (i.e., it only improve

Figures (2)

  • Figure 1: PeL in a broader AGI stack. $Z$ supports world modeling, planning/decision learning, retrieval/knowledge, and task heads. Task/return gradients do not flow into $f_\phi$.
  • Figure 2: PeL trains $f_\phi$ on unlabeled views to produce a stable, informative, reusable code $Z$. Task heads $g_\theta$ are trained separately and do not backpropagate into $f_\phi$.

Theorems & Definitions (8)

  • definition 1: Perception Learning
  • definition 2: Perceptual Property
  • remark 1
  • theorem 1: Orthogonality of PeL updates to Bayes-risk gradient
  • proof
  • remark 2
  • corollary 1: Two-stage optimality under exact invariance
  • proof