Many Perception Tasks are Highly Redundant Functions of their Input Data
Rahul Ramesh, Anthony Bisulco, Ronald W. DiTullio, Linran Wei, Vijay Balasubramanian, Kostas Daniilidis, Pratik Chaudhari
TL;DR
The paper addresses why many perception tasks remain highly predictable even when input data are projected into subspaces with reduced variance. It systematically analyzes projections defined by PCA, Fourier, and wavelet bases across diverse tasks (classification, semantic segmentation, optical flow, depth estimation, and vocalization discrimination), using mutual information and partial information decomposition to reveal redundancy and synergy among subspaces. The key finding is that while the principal subspace is most predictive, substantial information about the task is distributed across the entire spectrum, including tail bands and even random subspaces, with deep networks predominantly relying on head information. These results have implications for neuroscience and deep learning theory, suggesting that redundancy in natural signals and tasks underpins robust representations and may inform more efficient learning strategies and architectural choices in practice.
Abstract
We show that many perception tasks, from visual recognition, semantic segmentation, optical flow, depth estimation to vocalization discrimination, are highly redundant functions of their input data. Images or spectrograms, projected into different subspaces, formed by orthogonal bases in pixel, Fourier or wavelet domains, can be used to solve these tasks remarkably well regardless of whether it is the top subspace where data varies the most, some intermediate subspace with moderate variability--or the bottom subspace where data varies the least. This phenomenon occurs because different subspaces have a large degree of redundant information relevant to the task.
