Table of Contents
Fetching ...

Emergence of Quantised Representations Isolated to Anisotropic Functions

George Bird

TL;DR

The paper investigates whether anisotropy in activation functions can drive the emergence of quantised representations, by isolating activation-function symmetry through a controlled ablation using the Privileged-Plane Projective Method (PPP). It contrasts anisotropic permutation-equivariant forms with isotropic orthogonal-equivariant forms in autoencoders and shows that anisotropic activations induce task-agnostic, axis-aligned, discrete-like representation clusters, while isotropic activations yield smoother, continuous representations. Theoretical analysis links these phenomena to the Jacobians of the activation functions, and empirical results across datasets, depths, and widths corroborate that maximal symmetry predicts the observed inductive biases. The findings imply that many interpretability phenomena may arise from function-form choices rather than fundamental properties of learning, highlighting the need to taxonomise primitives and consider isotropy as a design axis for safer, more flexible representations.

Abstract

Presented is a novel methodology for determining representational structure, which builds upon the existing Spotlight Resonance method. This new tool is used to gain insight into how discrete representations can emerge and organise in autoencoder models, through a controlled ablation study that alters only the activation function. Using this technique, the validity of whether function-driven symmetries can act as implicit inductive biases on representations is determined. Representations are found to tend to discretise when the activation functions are defined through a discrete algebraic permutation-equivariant symmetry. In contrast, they remain continuous under a continuous algebraic orthogonal-equivariant definition. This confirms the hypothesis that the symmetries of network primitives can carry unintended inductive biases, leading to task-independent artefactual structures in representations. The discrete symmetry of contemporary forms is shown to be a strong predictor for the production of symmetry-organised discrete representations emerging from otherwise continuous distributions -- a quantisation effect. This motivates further reassessment of functional forms in common usage due to such unintended consequences. Moreover, this supports a general causal model for a mode in which discrete representations may form, and could constitute a prerequisite for downstream interpretability phenomena, including grandmother neurons, discrete coding schemes, general linear features and a type of Superposition. Hence, this tool and proposed mechanism for the influence of functional form on representations may provide insights into interpretability research. Finally, preliminary results indicate that quantisation of representations correlates with a measurable increase in reconstruction error, reinforcing previous conjectures that this collapse can be detrimental.

Emergence of Quantised Representations Isolated to Anisotropic Functions

TL;DR

The paper investigates whether anisotropy in activation functions can drive the emergence of quantised representations, by isolating activation-function symmetry through a controlled ablation using the Privileged-Plane Projective Method (PPP). It contrasts anisotropic permutation-equivariant forms with isotropic orthogonal-equivariant forms in autoencoders and shows that anisotropic activations induce task-agnostic, axis-aligned, discrete-like representation clusters, while isotropic activations yield smoother, continuous representations. Theoretical analysis links these phenomena to the Jacobians of the activation functions, and empirical results across datasets, depths, and widths corroborate that maximal symmetry predicts the observed inductive biases. The findings imply that many interpretability phenomena may arise from function-form choices rather than fundamental properties of learning, highlighting the need to taxonomise primitives and consider isotropy as a design axis for safer, more flexible representations.

Abstract

Presented is a novel methodology for determining representational structure, which builds upon the existing Spotlight Resonance method. This new tool is used to gain insight into how discrete representations can emerge and organise in autoencoder models, through a controlled ablation study that alters only the activation function. Using this technique, the validity of whether function-driven symmetries can act as implicit inductive biases on representations is determined. Representations are found to tend to discretise when the activation functions are defined through a discrete algebraic permutation-equivariant symmetry. In contrast, they remain continuous under a continuous algebraic orthogonal-equivariant definition. This confirms the hypothesis that the symmetries of network primitives can carry unintended inductive biases, leading to task-independent artefactual structures in representations. The discrete symmetry of contemporary forms is shown to be a strong predictor for the production of symmetry-organised discrete representations emerging from otherwise continuous distributions -- a quantisation effect. This motivates further reassessment of functional forms in common usage due to such unintended consequences. Moreover, this supports a general causal model for a mode in which discrete representations may form, and could constitute a prerequisite for downstream interpretability phenomena, including grandmother neurons, discrete coding schemes, general linear features and a type of Superposition. Hence, this tool and proposed mechanism for the influence of functional form on representations may provide insights into interpretability research. Finally, preliminary results indicate that quantisation of representations correlates with a measurable increase in reconstruction error, reinforcing previous conjectures that this collapse can be detrimental.

Paper Structure

This paper contains 28 sections, 27 equations, 37 figures.

Figures (37)

  • Figure 1: Displays ten rows of PPP-method's results, divided into columns. Each row represents an independent autoencoder network trained on the reconstruction of the CIFAR dataset Alex2009. Each column represents the results of the PPP method at various stages of training. The leftmost column shows a freshly initialised model before training, and moving rightward, the network is progressively trained for up to 125 epochs, as shown in the rightmost column. Hot-spots indicate where the internal latent layer is particularly dense with representations --- collated over all samples from the CIFAR training set. The top five rows depict networks which utilise the anisotropic activation function, standard-tanh, whilst the bottom five rows utilise the isotropic activation function, isotropic-tanh. Every other detail is identical otherwise. Figure titles indicate the exact number of epochs trained for. This specific network consists of a latent layer of 18 neurons, with standard (unnormalised) input-output pairs drawn from CIFAR. The dark centres about the origin are attributed to a vanishing volume due to an angular threshold, rather than the absence of representations.
  • Figure 2: Displays an identical plot to Fig.\ref{['Fig:ResultsOne']}, except for the results being drawn from a network with three latent layers, each with 18 neurons per layer. The latent layer studied is the first latent layer, which precedes any activation function. Moreover, one can see in rows 1, 3, 4, and 5 that narrow individual beams slowly shift in position during optimisation, converging on strong axis-aligned distributions. This may suggest these axis-aligned positions offer more favourable non-linear maps for computation than elsewhere.
  • Figure 3: This plot displays identical networks, in every way, to Fig.\ref{['Fig:ResultsTwo']}, except from differing in the activation function applied. The top five rows utilise the standard Leaky-ReLU function, whilst the bottom five rows utilise an analogous isotropic Leaky-ReLU function, defined in App.\ref{['App:LeakyReLU']}. Isotropic-Leaky-ReLU results appear more clustered, but don't present as discretised like the aligned representational rays in standard Leaky-ReLU. The slight alignments in isotropic examples are thought to be a statistical result, discussed in App.\ref{['App:StatArtefactsB']}, and this is corroborated by further results in App.\ref{['App:LeakyReLU']}, which mitigate the statistical phenomenon. Hence, one can conclude that standard Leaky-ReLU also induces a quantisation bias, whereas isotropic Leaky-ReLU does not.
  • Figure 4: Displays the combination PPP-method applied to $60000$ samples (identical to CIFAR training set size) drawn from a multivariate normal and applied using a random orthogonal basis. A value of $\epsilon=0.75$ and $24$ neurons was chosen to make them comparable to all other results. Each individual plot shows a different random initialisation of both samples and basis. Zero-indexed by (row, column), one can observe axis-aligned underrepresentations, particularly in plots (1, 1), (2, 6), and overrepresentations, particularly in (1, 4), (2, 4), (2, 7), and (1, 0). Yet, it is known that higher-dimensional samples are approximately angularly uniform; therefore, this is a geometrical and statistical artefact resulting from the projection method collated over multiple planes.
  • Figure 5: Each individual plot displays an independent instance of a combination-PPP $\epsilon=0.75$ plot, of $60000$ uniformly drawn samples from $\left[0, 1\right]^{32\times32\times3}$, which are embedded in $\mathbb{R}^24$ using a linear map randomly drawn from a standard normal. Additionally, a random orthogonal distinguished basis was used for PPP, which was freshly sampled for each plot. One can observe a tendency towards strongly anti-aligned clusters due to the geometry of the stochastic dataset, map and PPP method. These artefactual clustered structures can be a more significant issue for interpreting PPP plots; therefore, the underlying dataset warrants normalisation efforts for better analysis and stronger conviction in conclusions. Additionally, these clustering artefacts remain inconsistent with the emergence of the highly concentrated narrow beams, which emerge through training in prior results.
  • ...and 32 more figures