Harmonizing Program Induction with Rate-Distortion Theory

Hanqi Zhou; David G. Nagy; Charley M. Wu

Harmonizing Program Induction with Rate-Distortion Theory

Hanqi Zhou, David G. Nagy, Charley M. Wu

TL;DR

This work addresses how Rate Distortion Theory (RDT) can be integrated with program induction to model resource-bounded learning of structured representations. It introduces a three-way trade among description length $R_l$, distortion $D$, and search budget $R_s$, and applies it to melody learning using Bayesian program induction with a typed combinatory logic, a PCFG prior, and adaptor grammars (AGs) that maintain a shared library across tasks. The study shows that a shared library enables more efficient, compact representations and better generalization, while revealing curriculum sensitivity; it further demonstrates that partial information decomposition (PID) can guide the design of synergistic curricula that enhance library usefulness. These results provide a normative, resource-bounded account of how compositional programs might be learned and reused across tasks, with implications for understanding human learning and for building curriculum-aware AI systems that leverage reusable subprograms.

Abstract

Many aspects of human learning have been proposed as a process of constructing mental programs: from acquiring symbolic number representations to intuitive theories about the world. In parallel, there is a long-tradition of using information processing to model human cognition through Rate Distortion Theory (RDT). Yet, it is still poorly understood how to apply RDT when mental representations take the form of programs. In this work, we adapt RDT by proposing a three way trade-off among rate (description length), distortion (error), and computational costs (search budget). We use simulations on a melody task to study the implications of this trade-off, and show that constructing a shared program library across tasks provides global benefits. However, this comes at the cost of sensitivity to curricula, which is also characteristic of human learners. Finally, we use methods from partial information decomposition to generate training curricula that induce more effective libraries and better generalization.

Harmonizing Program Induction with Rate-Distortion Theory

TL;DR

, distortion

, and search budget

, and applies it to melody learning using Bayesian program induction with a typed combinatory logic, a PCFG prior, and adaptor grammars (AGs) that maintain a shared library across tasks. The study shows that a shared library enables more efficient, compact representations and better generalization, while revealing curriculum sensitivity; it further demonstrates that partial information decomposition (PID) can guide the design of synergistic curricula that enhance library usefulness. These results provide a normative, resource-bounded account of how compositional programs might be learned and reused across tasks, with implications for understanding human learning and for building curriculum-aware AI systems that leverage reusable subprograms.

Abstract

Paper Structure (20 sections, 8 equations, 5 figures)

This paper contains 20 sections, 8 equations, 5 figures.

Introduction
Goal and scope
Methods
Melody learning task
Bayesian program induction
Routers.
Primitives.
Types.
Prior over programs.
Compression with programs
Description length.
Search budget.
Approximate inference
Results
Compression by updating the library
...and 5 more sections

Figures (5)

Figure 1: (a) Program induction under resource constraints using an encoder-decoder framework on melody data. The encoder compresses melodies (piano rolls $X$) onto a latent space (programs $\pi$), which are constrained by two bottlenecks: description length $R_l$ and search budget $R_s$. (b) Each melody $X^{(i)}$ is assumed to be generated by program $\pi^{(i)}$, which is defined by a program library $\mathcal{L}$ (solid arrows). When inferring the program and library given observed melodies (dashed arrows), the goal is to find a balance between compact and easy-to-search programs, while minimizing reconstruction error $d(X,\hat{X})$. (c) Illustrative example of the tree structure and routers used in a program.
Figure 2: RD curves with (a-b) PCFGs and (c-d) AGs, given different description lengths and search budgets (left) and under different amounts of training data (right).
Figure 3: (a) Generalization performance given different search budgets. (b) The ratio of unique subprograms used in compressing different melodies.
Figure 4: Curriculum effect in AGs. (a) Comparison of matched runs of the same curriculum vs. different curricula. (b) Generalization performance of AGs with a curriculum-informed library vs. a randomized library.
Figure 5: (a) Example libraries learned by AGs with random curricula (blue) and learned with synergistic curricula (yellow; including $R\textrm{edundant}$, $U\textrm{nique}$, and $S\textrm{ynergistic}$ information) learned under different curricula. (b) Generalization performance given random and synergistic curricula.

Harmonizing Program Induction with Rate-Distortion Theory

TL;DR

Abstract

Harmonizing Program Induction with Rate-Distortion Theory

Authors

TL;DR

Abstract

Table of Contents

Figures (5)