On Implications of Scaling Laws on Feature Superposition

Pavan Katta

On Implications of Scaling Laws on Feature Superposition

Pavan Katta

TL;DR

This note uses scaling laws to challenge the compatibility of a complete feature-representation theory based on superposition with the claim that learned features are universal across models with equal performance. By analyzing equal-parameter transformers with different aspect ratios, it shows that the degree of feature superposition, tied to sparsity via $n/m = \frac{1}{(1-S)\log(1-S)}$, must vary with model shape, potentially forcing different feature sets on the same data. The resulting tension suggests that either the superposition hypothesis or feature universality must be revised, and it outlines alternative compression schemes and cross-layer superposition as potential directions. The work highlights fundamental limits in interpreting and transferring representations across models scaling along distinct architectural dimensions.

Abstract

Using results from scaling laws, this theoretical note argues that the following two statements cannot be simultaneously true: 1. Superposition hypothesis where sparse features are linearly represented across a layer is a complete theory of feature representation. 2. Features are universal, meaning two models trained on the same data and achieving equal performance will learn identical features.

On Implications of Scaling Laws on Feature Superposition

TL;DR

, must vary with model shape, potentially forcing different feature sets on the same data. The resulting tension suggests that either the superposition hypothesis or feature universality must be revised, and it outlines alternative compression schemes and cross-layer superposition as potential directions. The work highlights fundamental limits in interpreting and transferring representations across models scaling along distinct architectural dimensions.

Abstract

Paper Structure (5 sections, 4 equations, 1 figure, 1 table)

This paper contains 5 sections, 4 equations, 1 figure, 1 table.

Introduction
Case study on changing Aspect Ratio
Discussion
Schemes of Compression Alternative to Superposition
Cross Layer Superposition

Figures (1)

Figure 1: A visual representation of the cylinder analogy, showing the relationship between the number of neurons per layer and the number of layers. The volume of the cylinder represents the total number of parameters, while the surface area corresponds to the total number of neurons available for feature representation.

On Implications of Scaling Laws on Feature Superposition

TL;DR

Abstract

On Implications of Scaling Laws on Feature Superposition

Authors

TL;DR

Abstract

Table of Contents

Figures (1)