Table of Contents
Fetching ...

On Implications of Scaling Laws on Feature Superposition

Pavan Katta

TL;DR

This note uses scaling laws to challenge the compatibility of a complete feature-representation theory based on superposition with the claim that learned features are universal across models with equal performance. By analyzing equal-parameter transformers with different aspect ratios, it shows that the degree of feature superposition, tied to sparsity via $n/m = \frac{1}{(1-S)\log(1-S)}$, must vary with model shape, potentially forcing different feature sets on the same data. The resulting tension suggests that either the superposition hypothesis or feature universality must be revised, and it outlines alternative compression schemes and cross-layer superposition as potential directions. The work highlights fundamental limits in interpreting and transferring representations across models scaling along distinct architectural dimensions.

Abstract

Using results from scaling laws, this theoretical note argues that the following two statements cannot be simultaneously true: 1. Superposition hypothesis where sparse features are linearly represented across a layer is a complete theory of feature representation. 2. Features are universal, meaning two models trained on the same data and achieving equal performance will learn identical features.

On Implications of Scaling Laws on Feature Superposition

TL;DR

This note uses scaling laws to challenge the compatibility of a complete feature-representation theory based on superposition with the claim that learned features are universal across models with equal performance. By analyzing equal-parameter transformers with different aspect ratios, it shows that the degree of feature superposition, tied to sparsity via , must vary with model shape, potentially forcing different feature sets on the same data. The resulting tension suggests that either the superposition hypothesis or feature universality must be revised, and it outlines alternative compression schemes and cross-layer superposition as potential directions. The work highlights fundamental limits in interpreting and transferring representations across models scaling along distinct architectural dimensions.

Abstract

Using results from scaling laws, this theoretical note argues that the following two statements cannot be simultaneously true: 1. Superposition hypothesis where sparse features are linearly represented across a layer is a complete theory of feature representation. 2. Features are universal, meaning two models trained on the same data and achieving equal performance will learn identical features.
Paper Structure (5 sections, 4 equations, 1 figure, 1 table)

This paper contains 5 sections, 4 equations, 1 figure, 1 table.

Figures (1)

  • Figure 1: A visual representation of the cylinder analogy, showing the relationship between the number of neurons per layer and the number of layers. The volume of the cylinder represents the total number of parameters, while the surface area corresponds to the total number of neurons available for feature representation.