Refactoring Codebases through Library Design
Ziga Kovacic, Justin T. Chiu, Celine Lee, Wenting Zhao, Kevin Ellis
TL;DR
This paper tackles refactoring at scale by reframing it as library design to promote reusability and maintainability, especially as code generation agents tackle broader tasks. It introduces MiniCode, a diverse benchmark with open‑ended design, verifiable evaluation, and multi‑file context, and Librarian, a sample‑and‑rerank method that uses clustering to manage large codebases and MDL‑based ranking to produce reusable libraries. Across synthetic and real‑world codebases (CodeContests, Transformers, Diffusers), MDL consistently aligns with human preferences and yields more reusable abstractions than traditional metrics, while enabling libraries to transfer to unseen tasks. The work demonstrates practical impact by compressing and reorganizing HuggingFace libraries, suggesting a path toward scalable, reusable software design driven by MDL‑guided refactoring. Limitations include dependence on synthetic benchmarks and room for improvement in cross‑cluster reuse dynamics, suggesting future reinforcement learning extensions to further automate library synthesis.
Abstract
Maintainable and general software allows developers to build robust applications efficiently, yet achieving these qualities often requires refactoring specialized solutions into reusable components. This challenge becomes particularly relevant as code agents become used to solve isolated one-off programming problems. We investigate code agents' capacity to refactor code in ways that support growth and reusability. We first investigate what makes a good refactoring, finding via simulation results and a human study that Minimum Description Length best correlates with preferable refactorings. We then present both a benchmark and a method for refactoring: MiniCode, a benchmark where multiple files must be refactored into a shared library, and Librarian, a sample-and-rerank method for generating reusable libraries. We compare Librarian to state-of-the-art library generation methods, and study it on real-world code bases.
