Table of Contents
Fetching ...

Exploiting Edited Large Language Models as General Scientific Optimizers

Qitan Lv, Tianyu Liu, Hong Wang

TL;DR

This work introduces General Scientific Optimizers (GSO), a bi-level optimization framework that leverages inner-level simulators to generate observational feedback and outer-level LLMs as scientists to propose improved solutions, with bi-level interactions achieved via model editing. By decoupling evaluation from hypothesis generation and enabling dynamic exploitation-exploration, GSO addresses prompt sensitivity and loss-in-the-middle issues common in prompt-based optimization. Across seven scientific tasks and six backbone LLMs, GSO consistently outperforms baselines, including open-source and some closed-source models, and shows substantial gains in molecular property prediction such as HOMO-LUMO gaps. The approach demonstrates robustness to prompt variations and highlights the practical potential of integrating simulations with knowledge-editing LLMs for generalizable scientific optimization.

Abstract

Large language models (LLMs) have been widely adopted in mathematical optimization in scientific scenarios for their extensive knowledge and advanced reasoning capabilities. Existing methods mainly focus on utilizing LLMs to solve optimization problems in a prompt-based manner, which takes observational feedback as additional textual descriptions. However, due to LLM's \textbf{high sensitivity to the prompts} and \textbf{tendency to get lost in lengthy prompts}, these methods struggle to effectively utilize the {observational} feedback from each optimization step, which severely hinders the applications for real-world scenarios. To address these challenges, we propose a conceptually simple and general {bi-level} optimization method, namely \textbf{G}eneral \textbf{S}cientific \textbf{O}ptimizers (GSO). Specifically, GSO first utilizes inner-level simulators as experimental platforms to evaluate the current solution and provide observational feedback. Then, LLMs serve as knowledgeable and versatile scientists, generating new solutions by refining potential errors from the feedback as the outer-level optimization. Finally, simulations together with the expert knowledge in LLMs are jointly updated with bi-level interactions via model editing. Extensive experiments show that GSO consistently outperforms existing state-of-the-art methods using \textit{six} different LLM backbones on \textit{seven} different tasks, demonstrating the effectiveness and a wide range of applications.

Exploiting Edited Large Language Models as General Scientific Optimizers

TL;DR

This work introduces General Scientific Optimizers (GSO), a bi-level optimization framework that leverages inner-level simulators to generate observational feedback and outer-level LLMs as scientists to propose improved solutions, with bi-level interactions achieved via model editing. By decoupling evaluation from hypothesis generation and enabling dynamic exploitation-exploration, GSO addresses prompt sensitivity and loss-in-the-middle issues common in prompt-based optimization. Across seven scientific tasks and six backbone LLMs, GSO consistently outperforms baselines, including open-source and some closed-source models, and shows substantial gains in molecular property prediction such as HOMO-LUMO gaps. The approach demonstrates robustness to prompt variations and highlights the practical potential of integrating simulations with knowledge-editing LLMs for generalizable scientific optimization.

Abstract

Large language models (LLMs) have been widely adopted in mathematical optimization in scientific scenarios for their extensive knowledge and advanced reasoning capabilities. Existing methods mainly focus on utilizing LLMs to solve optimization problems in a prompt-based manner, which takes observational feedback as additional textual descriptions. However, due to LLM's \textbf{high sensitivity to the prompts} and \textbf{tendency to get lost in lengthy prompts}, these methods struggle to effectively utilize the {observational} feedback from each optimization step, which severely hinders the applications for real-world scenarios. To address these challenges, we propose a conceptually simple and general {bi-level} optimization method, namely \textbf{G}eneral \textbf{S}cientific \textbf{O}ptimizers (GSO). Specifically, GSO first utilizes inner-level simulators as experimental platforms to evaluate the current solution and provide observational feedback. Then, LLMs serve as knowledgeable and versatile scientists, generating new solutions by refining potential errors from the feedback as the outer-level optimization. Finally, simulations together with the expert knowledge in LLMs are jointly updated with bi-level interactions via model editing. Extensive experiments show that GSO consistently outperforms existing state-of-the-art methods using \textit{six} different LLM backbones on \textit{seven} different tasks, demonstrating the effectiveness and a wide range of applications.

Paper Structure

This paper contains 43 sections, 11 equations, 11 figures, 15 tables.

Figures (11)

  • Figure 1: GSO achieves state-of-the-art performance on a broad range of scientific optimization tasks compared with existing methods, using LLama 3 8B llama3 as the backbone. Results of other five LLMs are in Figures \ref{['fig:leida_qiansan']} and \ref{['fig:leida_housan']}.
  • Figure 2: The overview of GSO. For a given optimization task, GSO iteratively conducts the inner-level optimization, outer-level optimization, and bi-level interaction sequentially. The workflow is as follows: (i) the inner-level simulator $\Phi$ conducts numerical simulations based on the current step's hypothetical solution $s_k$ ($v_1 \to v_2 \to v_3 \to v_4 \to v_1$) and returns observational feedback $f_k, \mathcal{L}_k$ (the edge ($v_4, v_1$) has a larger distance than the edge ($v_2, v_4$), current total distance: $108$); (ii) the outer-level LLM $\mathcal{M}_{\theta_k}$ generates new hypothetical solutions $s_{k+1}$ ($v_1 \to v_2 \to v_4 \to v_3 \to v_1$) based on the observational feedback $f_k, \mathcal{L}_k$; (iii) the bi-level interaction jointly updates simulations in conjunction with the expert knowledge within the LLMs through model editing.
  • Figure 3: We visualize the average MSE loss values of each method for the non-linear Constitutive Law task (d) across five random seeds at the same optimization steps using Mistral 7B as the backbone model, with shading representing the standard deviation.
  • Figure 4: Causal tracing visualization results for Llama3 8B. The causal impact on output probability is mapped for (a) the effect of each hidden state on the prediction, (b) the effect of MLP activations alone, and (c) the effect of attention activations alone. We also give according to mean causal traces of over a sample of 1000 factual statements, shown as a line plot with 95% confidence intervals, which is below the first three figures. The confidence intervals confirm that the distinctions between peak and non-peak causal effects at both early and late sites are significant.
  • Figure 5: Causal tracing visualization results for GPT-J 6B. The causal impact on output probability is mapped for (a) the effect of each hidden state on the prediction, (b) the effect of MLP activations alone, and (c) the effect of attention activations alone. We also give according to mean causal traces of over a sample of 1000 factual statements, shown as a line plot with 95% confidence intervals, which is below the first three figures. The confidence intervals confirm that the distinctions between peak and non-peak causal effects at both early and late sites are significant.
  • ...and 6 more figures