Perception Learning: A Formal Separation of Sensory Representation Learning from Decision Learning
Suman Sanyal
TL;DR
End-to-end learning often entangles perception with decision tasks, hindering evaluation and transfer of perceptual representations. PeL proposes a perception-first paradigm that trains $f_phi: X \to Z$ using task-agnostic objectives to produce reusable codes, while downstream $g_theta: Z \to Y$ is learned separately; an orthogonality theorem shows that invariance-improving updates that preserve the invariant content do not impact Bayes risk under task-true invariances. It formalizes perceptual properties as functionals $Phi_P(f_phi; P_X, G)$ with target sets, and provides a suite of task-agnostic metrics (invariance, information preservation, geometry, etc.) to certify perceptual quality independent of task heads. The framework yields modular perception backbones for AGI stacks, with safety guarantees against overfitting to nuisance transformations and clearer diagnostics for perceptual health. Together, these contributions enable robust, transferable perception modules that decouple sensing from decision.
Abstract
We introduce Perception Learning (PeL), a paradigm that optimizes an agent's sensory interface $f_φ:\mathcal{X}\to\mathcal{Z}$ using task-agnostic signals, decoupled from downstream decision learning $g_θ:\mathcal{Z}\to\mathcal{Y}$. PeL directly targets label-free perceptual properties, such as stability to nuisances, informativeness without collapse, and controlled geometry, assessed via objective representation-invariant metrics. We formalize the separation of perception and decision, define perceptual properties independent of objectives or reparameterizations, and prove that PeL updates preserving sufficient invariants are orthogonal to Bayes task-risk gradients. Additionally, we provide a suite of task-agnostic evaluation metrics to certify perceptual quality.
