Table of Contents
Fetching ...

LILO: Learning Interpretable Libraries by Compressing and Documenting Code

Gabriel Grand, Lionel Wong, Maddy Bowers, Theo X. Olausson, Muxin Liu, Joshua B. Tenenbaum, Jacob Andreas

TL;DR

LILO is introduced, a neurosymbolic framework that iteratively synthesizes, compresses, and documents code to build libraries tailored to particular problem domains and boosts performance by helping LILO's synthesizer to interpret and deploy learned abstractions.

Abstract

While large language models (LLMs) now excel at code generation, a key aspect of software development is the art of refactoring: consolidating code into libraries of reusable and readable programs. In this paper, we introduce LILO, a neurosymbolic framework that iteratively synthesizes, compresses, and documents code to build libraries tailored to particular problem domains. LILO combines LLM-guided program synthesis with recent algorithmic advances in automated refactoring from Stitch: a symbolic compression system that efficiently identifies optimal lambda abstractions across large code corpora. To make these abstractions interpretable, we introduce an auto-documentation (AutoDoc) procedure that infers natural language names and docstrings based on contextual examples of usage. In addition to improving human readability, we find that AutoDoc boosts performance by helping LILO's synthesizer to interpret and deploy learned abstractions. We evaluate LILO on three inductive program synthesis benchmarks for string editing, scene reasoning, and graphics composition. Compared to existing neural and symbolic methods - including the state-of-the-art library learning algorithm DreamCoder - LILO solves more complex tasks and learns richer libraries that are grounded in linguistic knowledge.

LILO: Learning Interpretable Libraries by Compressing and Documenting Code

TL;DR

LILO is introduced, a neurosymbolic framework that iteratively synthesizes, compresses, and documents code to build libraries tailored to particular problem domains and boosts performance by helping LILO's synthesizer to interpret and deploy learned abstractions.

Abstract

While large language models (LLMs) now excel at code generation, a key aspect of software development is the art of refactoring: consolidating code into libraries of reusable and readable programs. In this paper, we introduce LILO, a neurosymbolic framework that iteratively synthesizes, compresses, and documents code to build libraries tailored to particular problem domains. LILO combines LLM-guided program synthesis with recent algorithmic advances in automated refactoring from Stitch: a symbolic compression system that efficiently identifies optimal lambda abstractions across large code corpora. To make these abstractions interpretable, we introduce an auto-documentation (AutoDoc) procedure that infers natural language names and docstrings based on contextual examples of usage. In addition to improving human readability, we find that AutoDoc boosts performance by helping LILO's synthesizer to interpret and deploy learned abstractions. We evaluate LILO on three inductive program synthesis benchmarks for string editing, scene reasoning, and graphics composition. Compared to existing neural and symbolic methods - including the state-of-the-art library learning algorithm DreamCoder - LILO solves more complex tasks and learns richer libraries that are grounded in linguistic knowledge.
Paper Structure (25 sections, 5 equations, 18 figures, 7 tables, 1 algorithm)

This paper contains 25 sections, 5 equations, 18 figures, 7 tables, 1 algorithm.

Figures (18)

  • Figure 1: Overview of the Lilo learning loop. (A) Lilo synthesizes programs based on natural language task descriptions using a dual-system search model. To refactor a set of program solutions, Lilo integrates a compression algorithm called Stitch (B; stitch) with LLM-generated auto-documentation (C) to produce an interpretable library of $\lambda$-abstractions. This search-compress-document loop simplifies the structure of program solutions (A vs. D), making it easier to solve more complex tasks on future iterations.
  • Figure 2: Lilo library auto-documentation (AutoDoc) workflow in the REGEX domain. For each Stitch abstraction (A), we prompt an instruction-tuned LLM with usage examples from solved tasks (B) to generate a human-readable name and description (C). The chat-style structure of AutoDoc allows naming choices to cascade sequentially; e.g., replace_consonant_with_substring(fn_51) refers back to vowel_regex(fn_42) and other named abstractions in a consistent and interpretable manner.
  • Figure 3: Learning curves during online synthesis. Within each plot, the x-axis tracks the experiment iteration and the y-axis shows the percent of tasks solved (top = test, bottom = train). Error bars show standard deviation across 3 randomly-seeded runs.
  • Figure 4: Evaluating library quality via offline synthesis. We run a timed enumerative search (x-axis; note the log-scale) with the final library $\mathcal{L}_f$ learned by each model in online synthesis or inferred post-hoc. In this setting, Lilo's $\mathcal{L}_f$ expedites discovery of test task solutions (y-axis) even without language guidance.
  • Figure 5: Qualitative inspection of LOGO library. Selected examples of graphics abstractions learned by Lilo. Highlights indicate ambiguities (orange) and errors (red) in naming and documentation that may affect code comprehension, which we discuss below.
  • ...and 13 more figures