Operationalising the Superficial Alignment Hypothesis via Task Complexity

Tomás Vergara-Browne; Darshan Patil; Ivan Titov; Siva Reddy; Tiago Pimentel; Marius Mosbach

Operationalising the Superficial Alignment Hypothesis via Task Complexity

Tomás Vergara-Browne, Darshan Patil, Ivan Titov, Siva Reddy, Tiago Pimentel, Marius Mosbach

TL;DR

A new metric called task complexity is proposed: the length of the shortest program that achieves a target performance on a task, which highlights that task adaptation often requires surprisingly little information -- often just a few kilobytes.

Abstract

The superficial alignment hypothesis (SAH) posits that large language models learn most of their knowledge during pre-training, and that post-training merely surfaces this knowledge. The SAH, however, lacks a precise definition, which has led to (i) different and seemingly orthogonal arguments supporting it, and (ii) important critiques to it. We propose a new metric called task complexity: the length of the shortest program that achieves a target performance on a task. In this framework, the SAH simply claims that pre-trained models drastically reduce the complexity of achieving high performance on many tasks. Our definition unifies prior arguments supporting the SAH, interpreting them as different strategies to find such short programs. Experimentally, we estimate the task complexity of mathematical reasoning, machine translation, and instruction following; we then show that these complexities can be remarkably low when conditioned on a pre-trained model. Further, we find that pre-training enables access to strong performances on our tasks, but it can require programs of gigabytes of length to access them. Post-training, on the other hand, collapses the complexity of reaching this same performance by several orders of magnitude. Overall, our results highlight that task adaptation often requires surprisingly little information -- often just a few kilobytes.

Operationalising the Superficial Alignment Hypothesis via Task Complexity

TL;DR

Abstract

Paper Structure (52 sections, 2 theorems, 13 equations, 10 figures)

This paper contains 52 sections, 2 theorems, 13 equations, 10 figures.

Introduction
Perspectives on Superficial Adaptation
The Superficial Alignment Hypothesis
Task Complexity
Adaptability and SAH
Connection to previous work
Estimating Task Complexity
Data methods.
Parametric methods.
Inference-control methods.
Experiments
Experimental setup
Universal Turing machine.
Models.
Tasks.
...and 37 more sections

Key Result

Corollary 1.5

Task complexity is uncomputable.

Figures (10)

Figure 1: Pareto curve of program length vs. performance for Olmo3-7B on GSM8K. We argue that prior works about the superficial alignment hypothesis can be seen as proposing different approaches to find short programs to solve a task, and we find that these different views inform different regions of this Pareto curve.
Figure 2: Pseudo-code of the programs ${\color{purple}\mathsf{P}}$ constructed by strategies in \ref{['sec:estimating-complexity']}. Each program includes its compressed data or parameters, and its size is usually dominated by such terms. We explain the details of how we measure program size in our experiments in \ref{['app:size']}.
Figure 3: The program length vs. performance Pareto curves for SmolLM3 and Olmo3 in our three analysed tasks: mathematical reasoning, machine translation, and instruction following. We test each method described in \ref{['sec:estimating-complexity']}, but show only the optimal lengths and performances. To provide some intuition about these program lengths, we also mark the size in bits of more commonly known references.
Figure 4: Performance of the linear projection fine-tuning proposed by chen2025extracting in comparison to our estimated Pareto curve. The performance of the method is far from the Pareto curve.
Figure 5: Estimated complexity curves conditioned on pre- and post-trained checkpoints of SmolLM3 and Olmo3 for math (GSM8K) and instruction (IFEval) tasks. Pre-training allows access of strong performance in these tasks. Post-training improves these performances, but more importantly, it greatly reduces the complexity to reach such performance.
...and 5 more figures

Theorems & Definitions (17)

Definition 3.1
Definition 3.2
Definition 3.3
Definition 3.4
Definition 3.5
Definition 3.6
Definition 3.7
Remark 1.1
proof
Remark 1.2
...and 7 more

Operationalising the Superficial Alignment Hypothesis via Task Complexity

TL;DR

Abstract

Operationalising the Superficial Alignment Hypothesis via Task Complexity

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (10)

Theorems & Definitions (17)