Table of Contents
Fetching ...

DiSciPLE: Learning Interpretable Programs for Scientific Visual Discovery

Utkarsh Mall, Cheng Perng Phoo, Mia Chiquier, Bharath Hariharan, Kavita Bala, Carl Vondrick

TL;DR

DiSciPLE addresses the need for interpretable yet accurate models in scientific visual tasks by introducing an LLM-guided evolutionary framework that synthesizes Python programs interleaving neural networks with symbolic operations. The method leverages a program critic and a simplification step to guide search and produce compact, interpretable programs, trained on geospatial datasets for population density, poverty indicators, and aboveground biomass. On three real-world benchmarks, DiSciPLE yields state-of-the-art interpretable programs, with notably lower errors than non-interpretable baselines and strong out-of-distribution generalization, while requiring comparatively less data. This approach enables reliable scientific insight by combining open-world primitives, LLM priors, and evolutionary search, offering a pragmatic path toward interpretable, data-efficient discovery in domains where domain experts repeatedly iterate with humans.

Abstract

Visual data is used in numerous different scientific workflows ranging from remote sensing to ecology. As the amount of observation data increases, the challenge is not just to make accurate predictions but also to understand the underlying mechanisms for those predictions. Good interpretation is important in scientific workflows, as it allows for better decision-making by providing insights into the data. This paper introduces an automatic way of obtaining such interpretable-by-design models, by learning programs that interleave neural networks. We propose DiSciPLE (Discovering Scientific Programs using LLMs and Evolution) an evolutionary algorithm that leverages common sense and prior knowledge of large language models (LLMs) to create Python programs explaining visual data. Additionally, we propose two improvements: a program critic and a program simplifier to improve our method further to synthesize good programs. On three different real-world problems, DiSciPLE learns state-of-the-art programs on novel tasks with no prior literature. For example, we can learn programs with 35% lower error than the closest non-interpretable baseline for population density estimation.

DiSciPLE: Learning Interpretable Programs for Scientific Visual Discovery

TL;DR

DiSciPLE addresses the need for interpretable yet accurate models in scientific visual tasks by introducing an LLM-guided evolutionary framework that synthesizes Python programs interleaving neural networks with symbolic operations. The method leverages a program critic and a simplification step to guide search and produce compact, interpretable programs, trained on geospatial datasets for population density, poverty indicators, and aboveground biomass. On three real-world benchmarks, DiSciPLE yields state-of-the-art interpretable programs, with notably lower errors than non-interpretable baselines and strong out-of-distribution generalization, while requiring comparatively less data. This approach enables reliable scientific insight by combining open-world primitives, LLM priors, and evolutionary search, offering a pragmatic path toward interpretable, data-efficient discovery in domains where domain experts repeatedly iterate with humans.

Abstract

Visual data is used in numerous different scientific workflows ranging from remote sensing to ecology. As the amount of observation data increases, the challenge is not just to make accurate predictions but also to understand the underlying mechanisms for those predictions. Good interpretation is important in scientific workflows, as it allows for better decision-making by providing insights into the data. This paper introduces an automatic way of obtaining such interpretable-by-design models, by learning programs that interleave neural networks. We propose DiSciPLE (Discovering Scientific Programs using LLMs and Evolution) an evolutionary algorithm that leverages common sense and prior knowledge of large language models (LLMs) to create Python programs explaining visual data. Additionally, we propose two improvements: a program critic and a program simplifier to improve our method further to synthesize good programs. On three different real-world problems, DiSciPLE learns state-of-the-art programs on novel tasks with no prior literature. For example, we can learn programs with 35% lower error than the closest non-interpretable baseline for population density estimation.

Paper Structure

This paper contains 33 sections, 4 equations, 6 figures, 4 tables, 1 algorithm.

Figures (6)

  • Figure 1: We introduce a framework to discover interpretable, predictive programs for scientific computer vision tasks.
  • Figure 2: Overview of our evolutionary algorithm with critic and simplification. We start with an initialized bank of program trying to solve a task. From this bank we sample pairs of programs based on their fitness score and perform crossover/mutations over them to produce new programs. The generated program is further improved by passing it through a critic and then an analytical simplification step. This program is then evaluated and put in the next generation of program bank. The evaluation score of the program is used to determine the fitness for the next generation of evolution.
  • Figure 3: DiSciPLE's learning loop
  • Figure 4: The best performing programs for each of the 3 benchmark problems as Python programs (left in each card) and the corresponding DAG representation on the right. The DAG representation allows better visualization of the importance of different components. The thickness of the red edges determine how important that component is. A black edge represents computation; when removed it is either the same as one of its subsequent edges or removing it could result in a bug.
  • Figure 5: Qualitative comparison of DiSciPLE with other baselines on the tasks of population density. DiSciPLE Can map to the true population density maps much more accurately than the baselines (Refer to the supplementary for more comparisons). The maps display population density as the base-10 log of people per square mile.
  • ...and 1 more figures