Table of Contents
Fetching ...

IGDA: Interactive Graph Discovery through Large Language Model Agents

Alex Havrilla, David Alvarez-Melis, Nicolo Fusi

TL;DR

IGDA presents an LLM-based framework for interactive graph discovery that relies on semantic metadata rather than numerical data. It interleaves uncertainty-driven edge experimentation with local updates to neighboring edges, iterating over $R$ rounds with $I$ tests per round to minimize $d(\hat{G}_R, G^*)$ and maximize $F1(G^*, \hat{G}_R)$. Across eight real-world graphs, IGDA often outperforms baselines, including an adaptation of a state-of-the-art numerical method, and ablations confirm the centrality of uncertainty-based selection and local prompting. A memorization test on a July 2024 brain graph shows robust performance even when the graph is not in the LLM’s training data, underscoring the method’s generality. IGDA thus provides a powerful, complementary approach to existing numerical causal discovery techniques by leveraging semantic metadata and interactive feedback.

Abstract

Large language models ($\textbf{LLMs}$) have emerged as a powerful method for discovery. Instead of utilizing numerical data, LLMs utilize associated variable $\textit{semantic metadata}$ to predict variable relationships. Simultaneously, LLMs demonstrate impressive abilities to act as black-box optimizers when given an objective $f$ and sequence of trials. We study LLMs at the intersection of these two capabilities by applying LLMs to the task of $\textit{interactive graph discovery}$: given a ground truth graph $G^*$ capturing variable relationships and a budget of $I$ edge experiments over $R$ rounds, minimize the distance between the predicted graph $\hat{G}_R$ and $G^*$ at the end of the $R$-th round. To solve this task we propose $\textbf{IGDA}$, a LLM-based pipeline incorporating two key components: 1) an LLM uncertainty-driven method for edge experiment selection 2) a local graph update strategy utilizing binary feedback from experiments to improve predictions for unselected neighboring edges. Experiments on eight different real-world graphs show our approach often outperforms all baselines including a state-of-the-art numerical method for interactive graph discovery. Further, we conduct a rigorous series of ablations dissecting the impact of each pipeline component. Finally, to assess the impact of memorization, we apply our interactive graph discovery strategy to a complex, new (as of July 2024) causal graph on protein transcription factors, finding strong performance in a setting where memorization is impossible. Overall, our results show IGDA to be a powerful method for graph discovery complementary to existing numerically driven approaches.

IGDA: Interactive Graph Discovery through Large Language Model Agents

TL;DR

IGDA presents an LLM-based framework for interactive graph discovery that relies on semantic metadata rather than numerical data. It interleaves uncertainty-driven edge experimentation with local updates to neighboring edges, iterating over rounds with tests per round to minimize and maximize . Across eight real-world graphs, IGDA often outperforms baselines, including an adaptation of a state-of-the-art numerical method, and ablations confirm the centrality of uncertainty-based selection and local prompting. A memorization test on a July 2024 brain graph shows robust performance even when the graph is not in the LLM’s training data, underscoring the method’s generality. IGDA thus provides a powerful, complementary approach to existing numerical causal discovery techniques by leveraging semantic metadata and interactive feedback.

Abstract

Large language models () have emerged as a powerful method for discovery. Instead of utilizing numerical data, LLMs utilize associated variable to predict variable relationships. Simultaneously, LLMs demonstrate impressive abilities to act as black-box optimizers when given an objective and sequence of trials. We study LLMs at the intersection of these two capabilities by applying LLMs to the task of : given a ground truth graph capturing variable relationships and a budget of edge experiments over rounds, minimize the distance between the predicted graph and at the end of the -th round. To solve this task we propose , a LLM-based pipeline incorporating two key components: 1) an LLM uncertainty-driven method for edge experiment selection 2) a local graph update strategy utilizing binary feedback from experiments to improve predictions for unselected neighboring edges. Experiments on eight different real-world graphs show our approach often outperforms all baselines including a state-of-the-art numerical method for interactive graph discovery. Further, we conduct a rigorous series of ablations dissecting the impact of each pipeline component. Finally, to assess the impact of memorization, we apply our interactive graph discovery strategy to a complex, new (as of July 2024) causal graph on protein transcription factors, finding strong performance in a setting where memorization is impossible. Overall, our results show IGDA to be a powerful method for graph discovery complementary to existing numerically driven approaches.

Paper Structure

This paper contains 22 sections, 1 equation, 19 figures, 1 algorithm.

Figures (19)

  • Figure 1: Diagram of the interactive graph discovery process through LLMs. The process begins by predicting edges and confidences for each edge. Interactive discovery then proceeds by selecting the most uncertain edges for experimentation. The LLM then updates its predictions and confidences for edges adjacent to the selected edge. Note: only edges predicted as present are shown.
  • Figure 2: Results on real world graphs showing F1 score of the predicted graph against percentage of edges in the graph selected. IGDA almost always outperforms both the random baseline and static selection via uncertainty. Note: static confidence selection without local updates is deterministic and thus has no confidence intervals. Additionally, GIT is not reported on the Arctic graph because the grpah is cyclic.
  • Figure 3: Average rank of each method when numbered from $0$ to $2$ across each timestep on each graph. The full LLM driven update agent consistently achieves rank $0$ across all timesteps. Note: lower is better.
  • Figure 4: GIT with varying amounts of observational and interventional data. Decreasing either observational or interventional sample sizes can decrease performance by over 0.2 F1 score.
  • Figure 5: % Improvement from experiments vs. LLM prediction updates across timesteps. Improvement directly from LLM updates peaks early but then falls off. Improvement from experiments stays constant or improves with more experiments as confidence scores become better calibrated.
  • ...and 14 more figures