Why Code, Why Now: Learnability, Computability, and the Real Limits of Machine Learning
Zhimin Zhao
TL;DR
The paper addresses why code generation scales more predictably than reinforcement learning by arguing that task learnability is governed by information structure rather than model size. It introduces a five-level hierarchy of learnability based on feedback quality and formalizes expressibility, computability, and learnability within a unified template using risk functionals. The analysis explains why supervised learning on code benefits from dense, locally verifiable signals, while RL suffers from misaligned, non-stationary, or reflexive rewards, leading to weaker scaling behavior. It also proposes practical paths forward—task decomposition, engineered feedback, and weaker objectives—to transform unlearnable tasks into learnable ones and guide future scaling efforts.
Abstract
Code generation has progressed more reliably than reinforcement learning, largely because code has an information structure that makes it learnable. Code provides dense, local, verifiable feedback at every token, whereas most reinforcement learning problems do not. This difference in feedback quality is not binary but graded. We propose a five-level hierarchy of learnability based on information structure and argue that the ceiling on ML progress depends less on model size than on whether a task is learnable at all. The hierarchy rests on a formal distinction among three properties of computational problems (expressibility, computability, and learnability). We establish their pairwise relationships, including where implications hold and where they fail, and present a unified template that makes the structural differences explicit. The analysis suggests why supervised learning on code scales predictably while reinforcement learning does not, and why the common assumption that scaling alone will solve remaining ML challenges warrants scrutiny.
