Table of Contents
Fetching ...

LLM and Simulation as Bilevel Optimizers: A New Paradigm to Advance Physical Scientific Discovery

Pingchuan Ma, Tsun-Hsuan Wang, Minghao Guo, Zhiqing Sun, Joshua B. Tenenbaum, Daniela Rus, Chuang Gan, Wojciech Matusik

TL;DR

The work introduces Scientific Generative Agent (SGA), a bilevel framework that couples outer-level LLM-driven hypothesis generation with inner-level differentiable simulations to accelerate physical scientific discovery. It demonstrates constitutive-law discovery and molecular design, showing that the approach can yield novel, coherent solutions beyond human expectations and that bilevel optimization with an exploitation-exploration strategy is key to success. The method generalizes across disciplines and offers a unified paradigm for grounding abstract reasoning in experimental feedback. While promising, the paper notes limitations in interpretability, safety, and computational cost, outlining avenues for future integration with manual constraints and human feedback.

Abstract

Large Language Models have recently gained significant attention in scientific discovery for their extensive knowledge and advanced reasoning capabilities. However, they encounter challenges in effectively simulating observational feedback and grounding it with language to propel advancements in physical scientific discovery. Conversely, human scientists undertake scientific discovery by formulating hypotheses, conducting experiments, and revising theories through observational analysis. Inspired by this, we propose to enhance the knowledge-driven, abstract reasoning abilities of LLMs with the computational strength of simulations. We introduce Scientific Generative Agent (SGA), a bilevel optimization framework: LLMs act as knowledgeable and versatile thinkers, proposing scientific hypotheses and reason about discrete components, such as physics equations or molecule structures; meanwhile, simulations function as experimental platforms, providing observational feedback and optimizing via differentiability for continuous parts, such as physical parameters. We conduct extensive experiments to demonstrate our framework's efficacy in constitutive law discovery and molecular design, unveiling novel solutions that differ from conventional human expectations yet remain coherent upon analysis.

LLM and Simulation as Bilevel Optimizers: A New Paradigm to Advance Physical Scientific Discovery

TL;DR

The work introduces Scientific Generative Agent (SGA), a bilevel framework that couples outer-level LLM-driven hypothesis generation with inner-level differentiable simulations to accelerate physical scientific discovery. It demonstrates constitutive-law discovery and molecular design, showing that the approach can yield novel, coherent solutions beyond human expectations and that bilevel optimization with an exploitation-exploration strategy is key to success. The method generalizes across disciplines and offers a unified paradigm for grounding abstract reasoning in experimental feedback. While promising, the paper notes limitations in interpretability, safety, and computational cost, outlining avenues for future integration with manual constraints and human feedback.

Abstract

Large Language Models have recently gained significant attention in scientific discovery for their extensive knowledge and advanced reasoning capabilities. However, they encounter challenges in effectively simulating observational feedback and grounding it with language to propel advancements in physical scientific discovery. Conversely, human scientists undertake scientific discovery by formulating hypotheses, conducting experiments, and revising theories through observational analysis. Inspired by this, we propose to enhance the knowledge-driven, abstract reasoning abilities of LLMs with the computational strength of simulations. We introduce Scientific Generative Agent (SGA), a bilevel optimization framework: LLMs act as knowledgeable and versatile thinkers, proposing scientific hypotheses and reason about discrete components, such as physics equations or molecule structures; meanwhile, simulations function as experimental platforms, providing observational feedback and optimizing via differentiability for continuous parts, such as physical parameters. We conduct extensive experiments to demonstrate our framework's efficacy in constitutive law discovery and molecular design, unveiling novel solutions that differ from conventional human expectations yet remain coherent upon analysis.
Paper Structure (46 sections, 5 equations, 6 figures, 6 tables, 1 algorithm)

This paper contains 46 sections, 5 equations, 6 figures, 6 tables, 1 algorithm.

Figures (6)

  • Figure 1: The overall pipeline of Scientific Generative Agent (SGA). Taking the constitutive law searching problem as an example, the input is an initial guess (a purely elastic material), and the output is another constitutive law optimized towards the ground-truth (weakly compressible fluid). The initial guess first initialize a top-$K$ heap for storing the solutions. In the outer-level optimization, an LLM takes in top-$K$ previously proposed solutions and generates a better one upon them with modified continuous parameterization $\Theta$ and discrete expression $\mathcal{E}$. In the inner-level optimization, a gradient-based optimization solves for optimal $\Theta$ via simulation and appends these optimized solutions in the heap. After a few iterations of bilevel optimization, the heap returns the top-1 solutions as the final solution.
  • Figure 2: Loss trends comparison. Loss of the best solution averaged across seeds at different iterations of LLM-driven optimization, where the shading shows the min/max value.
  • Figure 3: Ablation on bilevel optimization. We denote the optimization trajectory with and without out bilevel optimization with red dot and orange triangle respectively. We visualize the intermediate step of our method before the inner-level optimization using orange cross. We also highlight the outer LLM optimization and inner simulation optimization using orange and red arrows.
  • Figure 4: Ablation on the backbone LLM. We compare the performances of 4 selected backbone LLMs and report the rank of them. A outer curve indicates a better performance.
  • Figure 5: Ablation on exploration-exploitation. (a) Histogram of solutions that are valid for simulation (Eq. \ref{['eq:bilevel_2']}) across iterations. (b) Loss ($\mathcal{L}$ in Sec. \ref{['ssec:bilevel']}) of the best solution averaged across seeds at different iterations, where the shading indicates the min/max values.
  • ...and 1 more figures