Table of Contents
Fetching ...

Automated Statistical Model Discovery with Language Models

Michael Y. Li, Emily B. Fox, Noah D. Goodman

TL;DR

The paper tackles automated statistical model discovery in large, constraint-rich spaces by introducing BoxLM, a framework where a language model proposes probabilistic programs, a generic inference engine fits them, and a critic LM provides natural-language feedback in a Box's Loop. This approach eliminates the need for handcrafted domain-specific languages and enables open-ended modeling while maintaining principled Bayesian inference and model criticism via posterior predictive checks. Across three experimental settings—constrained DSL kernel discovery, open-ended real-world modeling, and constraint-guided improvements of classic models—the method achieves performance on par with expert-designed models and can surpass baselines when domain knowledge is expressed in natural language. The results demonstrate the promise of LM-driven statistical model discovery for accelerating scientific modeling and highlight future directions such as active data collection and LM fine-tuning for probabilistic programming.

Abstract

Statistical model discovery is a challenging search over a vast space of models subject to domain-specific constraints. Efficiently searching over this space requires expertise in modeling and the problem domain. Motivated by the domain knowledge and programming capabilities of large language models (LMs), we introduce a method for language model driven automated statistical model discovery. We cast our automated procedure within the principled framework of Box's Loop: the LM iterates between proposing statistical models represented as probabilistic programs, acting as a modeler, and critiquing those models, acting as a domain expert. By leveraging LMs, we do not have to define a domain-specific language of models or design a handcrafted search procedure, which are key restrictions of previous systems. We evaluate our method in three settings in probabilistic modeling: searching within a restricted space of models, searching over an open-ended space, and improving expert models under natural language constraints (e.g., this model should be interpretable to an ecologist). Our method identifies models on par with human expert designed models and extends classic models in interpretable ways. Our results highlight the promise of LM-driven model discovery.

Automated Statistical Model Discovery with Language Models

TL;DR

The paper tackles automated statistical model discovery in large, constraint-rich spaces by introducing BoxLM, a framework where a language model proposes probabilistic programs, a generic inference engine fits them, and a critic LM provides natural-language feedback in a Box's Loop. This approach eliminates the need for handcrafted domain-specific languages and enables open-ended modeling while maintaining principled Bayesian inference and model criticism via posterior predictive checks. Across three experimental settings—constrained DSL kernel discovery, open-ended real-world modeling, and constraint-guided improvements of classic models—the method achieves performance on par with expert-designed models and can surpass baselines when domain knowledge is expressed in natural language. The results demonstrate the promise of LM-driven statistical model discovery for accelerating scientific modeling and highlight future directions such as active data collection and LM fine-tuning for probabilistic programming.

Abstract

Statistical model discovery is a challenging search over a vast space of models subject to domain-specific constraints. Efficiently searching over this space requires expertise in modeling and the problem domain. Motivated by the domain knowledge and programming capabilities of large language models (LMs), we introduce a method for language model driven automated statistical model discovery. We cast our automated procedure within the principled framework of Box's Loop: the LM iterates between proposing statistical models represented as probabilistic programs, acting as a modeler, and critiquing those models, acting as a domain expert. By leveraging LMs, we do not have to define a domain-specific language of models or design a handcrafted search procedure, which are key restrictions of previous systems. We evaluate our method in three settings in probabilistic modeling: searching within a restricted space of models, searching over an open-ended space, and improving expert models under natural language constraints (e.g., this model should be interpretable to an ecologist). Our method identifies models on par with human expert designed models and extends classic models in interpretable ways. Our results highlight the promise of LM-driven model discovery.
Paper Structure (42 sections, 8 equations, 10 figures, 4 tables, 1 algorithm)

This paper contains 42 sections, 8 equations, 10 figures, 4 tables, 1 algorithm.

Figures (10)

  • Figure 1: Language model driven automated model discovery (BoxLM). 1) The prompt for the LM contains the dataset in visual and/or textual form, dataset metadata (e.g., dataset description), the code for previous probabilistic programs, and natural language feedback. 2) Given this, the proposal LM proposes new models expressed as probabilistic programs. 3) To fit these programs in a generic way, we leverage probabilistic programming languages and obtain scores and posterior predictive samples. 4) After we fit models, we compute the posterior predictive mean and variance. We provide these statistics to a critic LM which produces natural language feedback to guide the next round of model building. 5) We propagate the best programs, their posterior predictive means and variances, and natural language feedback forward by updating the prompt.
  • Figure 2: Test set performance on time series datasets. Our BoxLM system identifies compositional kernels with performance on par with strong baselines. (left) Comparison of BoxLM test mean absolute error (MAE) against Automatic Statistician using greedy search (AS), spectral mixture kernel (SM), periodic kernel (Periodic), and N-BEATS. BoxLM+ searches over an augmented kernel space. We bold the best and underline the second best among the GP methods, treating N-BEATS as a powerful non-GP-constrained baseline. (right) Extrapolations from GP with a BoxLM-discovered kernel.
  • Figure 3: Domain knowledge shapes BoxLM modeling approaches. In the top row, we keep all metadata (e.g., dataset description). In the bottom row, we remove metadata that reveals information about the domain. This leads to qualitatively different approaches for three datasets; for eight schools, BoxLM discovers a hierarchical model even without metadata. We list the corresponding programs for these different ablations: (top row)$\frac{K}{(1 + ((K - P_0) / P_0) \exp(-rt))}$, $\text{BetaBin}(n, \alpha, \beta)$, $L_{\text{inf}}(1 - \exp(-k (\text{age} - t_0)))$; (bottom row)$a + b x_0 + c x_0^2 + dx_0^3 + e x_0^4$, $\alpha + \beta x_0 + \gamma x_0^2$, $\alpha + \beta_1\log x_0 + \beta_2 \log x_0^2 + \beta_3 \log x_0^3$.
  • Figure 4: Correcting misspecified Lotka-Volterra dynamics.BoxLM can introduce corrections to standard Lotka-Volterra dynamics (no warm-start) and a hybrid neural ODE approach (warm-start) that outperform several baselines. (left)LM-proposed model predictions on training data and extrapolations (grey region). (right) Test MAE of LM models (No-WS, WS-Constraint, and WS-No Constraint) compared to the standard Lotka-Volterra model LV, a Neural ODE, and a hybrid Neural ODE model with a multiplicative correction to the prey-predator dynamics (Hybrid).
  • Figure 5: BoxLM can propose corrections to ODEs.(top) In the no warm-start (No-WS) variation, BoxLM introduces corrections informed by domain knowledge of predator-prey models (carrying capacity, predation saturation). (middle) When prompted to introduce neural networks in an interpretable way (WS-Constraint), one strategy BoxLM proposes is to make the handling time parameter depend non-linearly on the prey density, extending a traditional approach to modeling predation saturation. (bottom) When prompted to introduce neural networks without constraints (WS-No Constraint), BoxLM introduces additive MLP-parameterized corrections and adjusts the scaling factors.
  • ...and 5 more figures