Emergence of Quantised Representations Isolated to Anisotropic Functions
George Bird
TL;DR
The paper investigates whether anisotropy in activation functions can drive the emergence of quantised representations, by isolating activation-function symmetry through a controlled ablation using the Privileged-Plane Projective Method (PPP). It contrasts anisotropic permutation-equivariant forms with isotropic orthogonal-equivariant forms in autoencoders and shows that anisotropic activations induce task-agnostic, axis-aligned, discrete-like representation clusters, while isotropic activations yield smoother, continuous representations. Theoretical analysis links these phenomena to the Jacobians of the activation functions, and empirical results across datasets, depths, and widths corroborate that maximal symmetry predicts the observed inductive biases. The findings imply that many interpretability phenomena may arise from function-form choices rather than fundamental properties of learning, highlighting the need to taxonomise primitives and consider isotropy as a design axis for safer, more flexible representations.
Abstract
Presented is a novel methodology for determining representational structure, which builds upon the existing Spotlight Resonance method. This new tool is used to gain insight into how discrete representations can emerge and organise in autoencoder models, through a controlled ablation study that alters only the activation function. Using this technique, the validity of whether function-driven symmetries can act as implicit inductive biases on representations is determined. Representations are found to tend to discretise when the activation functions are defined through a discrete algebraic permutation-equivariant symmetry. In contrast, they remain continuous under a continuous algebraic orthogonal-equivariant definition. This confirms the hypothesis that the symmetries of network primitives can carry unintended inductive biases, leading to task-independent artefactual structures in representations. The discrete symmetry of contemporary forms is shown to be a strong predictor for the production of symmetry-organised discrete representations emerging from otherwise continuous distributions -- a quantisation effect. This motivates further reassessment of functional forms in common usage due to such unintended consequences. Moreover, this supports a general causal model for a mode in which discrete representations may form, and could constitute a prerequisite for downstream interpretability phenomena, including grandmother neurons, discrete coding schemes, general linear features and a type of Superposition. Hence, this tool and proposed mechanism for the influence of functional form on representations may provide insights into interpretability research. Finally, preliminary results indicate that quantisation of representations correlates with a measurable increase in reconstruction error, reinforcing previous conjectures that this collapse can be detrimental.
