DiSciPLE: Learning Interpretable Programs for Scientific Visual Discovery
Utkarsh Mall, Cheng Perng Phoo, Mia Chiquier, Bharath Hariharan, Kavita Bala, Carl Vondrick
TL;DR
DiSciPLE addresses the need for interpretable yet accurate models in scientific visual tasks by introducing an LLM-guided evolutionary framework that synthesizes Python programs interleaving neural networks with symbolic operations. The method leverages a program critic and a simplification step to guide search and produce compact, interpretable programs, trained on geospatial datasets for population density, poverty indicators, and aboveground biomass. On three real-world benchmarks, DiSciPLE yields state-of-the-art interpretable programs, with notably lower errors than non-interpretable baselines and strong out-of-distribution generalization, while requiring comparatively less data. This approach enables reliable scientific insight by combining open-world primitives, LLM priors, and evolutionary search, offering a pragmatic path toward interpretable, data-efficient discovery in domains where domain experts repeatedly iterate with humans.
Abstract
Visual data is used in numerous different scientific workflows ranging from remote sensing to ecology. As the amount of observation data increases, the challenge is not just to make accurate predictions but also to understand the underlying mechanisms for those predictions. Good interpretation is important in scientific workflows, as it allows for better decision-making by providing insights into the data. This paper introduces an automatic way of obtaining such interpretable-by-design models, by learning programs that interleave neural networks. We propose DiSciPLE (Discovering Scientific Programs using LLMs and Evolution) an evolutionary algorithm that leverages common sense and prior knowledge of large language models (LLMs) to create Python programs explaining visual data. Additionally, we propose two improvements: a program critic and a program simplifier to improve our method further to synthesize good programs. On three different real-world problems, DiSciPLE learns state-of-the-art programs on novel tasks with no prior literature. For example, we can learn programs with 35% lower error than the closest non-interpretable baseline for population density estimation.
