On Implications of Scaling Laws on Feature Superposition
Pavan Katta
TL;DR
This note uses scaling laws to challenge the compatibility of a complete feature-representation theory based on superposition with the claim that learned features are universal across models with equal performance. By analyzing equal-parameter transformers with different aspect ratios, it shows that the degree of feature superposition, tied to sparsity via $n/m = \frac{1}{(1-S)\log(1-S)}$, must vary with model shape, potentially forcing different feature sets on the same data. The resulting tension suggests that either the superposition hypothesis or feature universality must be revised, and it outlines alternative compression schemes and cross-layer superposition as potential directions. The work highlights fundamental limits in interpreting and transferring representations across models scaling along distinct architectural dimensions.
Abstract
Using results from scaling laws, this theoretical note argues that the following two statements cannot be simultaneously true: 1. Superposition hypothesis where sparse features are linearly represented across a layer is a complete theory of feature representation. 2. Features are universal, meaning two models trained on the same data and achieving equal performance will learn identical features.
