Table of Contents
Fetching ...

Scalability of Bayesian Network Structure Elicitation with Large Language Models: a Novel Methodology and Comparative Analysis

Nikolay Babakov, Ehud Reiter, Alberto Bugarin

TL;DR

This work addresses the problem of eliciting Bayesian Network structures without data by leveraging multiple Large Language Models (LLMs) in a Delphi-style, expert-aggregation framework. The authors introduce a novel method where an initial facilitator generates diverse LLM expert profiles, each expert independently reasons about possible causal edges, and final structure is formed by majority voting, with cycle-resolution prompts to handle conflicts. The method is evaluated against a baseline Harness approach across BNs of varying sizes, with a data-contamination test to assess whether LLMs have seen target BN structures during training; results show improvements over the baseline for at least one LLM (GPT-3.5) but reveal substantial scalability challenges as BN size grows and some BNs are unsuitable due to ambiguity or contamination. Overall, the study highlights the potential of LLM-driven BN elicitation while underscoring the need for input disambiguation, contamination checks, and larger-context or domain-specialist-enabled prompting to achieve reliable scalability in structure learning.

Abstract

In this work, we propose a novel method for Bayesian Networks (BNs) structure elicitation that is based on the initialization of several LLMs with different experiences, independently querying them to create a structure of the BN, and further obtaining the final structure by majority voting. We compare the method with one alternative method on various widely and not widely known BNs of different sizes and study the scalability of both methods on them. We also propose an approach to check the contamination of BNs in LLM, which shows that some widely known BNs are inapplicable for testing the LLM usage for BNs structure elicitation. We also show that some BNs may be inapplicable for such experiments because their node names are indistinguishable. The experiments on the other BNs show that our method performs better than the existing method with one of the three studied LLMs; however, the performance of both methods significantly decreases with the increase in BN size.

Scalability of Bayesian Network Structure Elicitation with Large Language Models: a Novel Methodology and Comparative Analysis

TL;DR

This work addresses the problem of eliciting Bayesian Network structures without data by leveraging multiple Large Language Models (LLMs) in a Delphi-style, expert-aggregation framework. The authors introduce a novel method where an initial facilitator generates diverse LLM expert profiles, each expert independently reasons about possible causal edges, and final structure is formed by majority voting, with cycle-resolution prompts to handle conflicts. The method is evaluated against a baseline Harness approach across BNs of varying sizes, with a data-contamination test to assess whether LLMs have seen target BN structures during training; results show improvements over the baseline for at least one LLM (GPT-3.5) but reveal substantial scalability challenges as BN size grows and some BNs are unsuitable due to ambiguity or contamination. Overall, the study highlights the potential of LLM-driven BN elicitation while underscoring the need for input disambiguation, contamination checks, and larger-context or domain-specialist-enabled prompting to achieve reliable scalability in structure learning.

Abstract

In this work, we propose a novel method for Bayesian Networks (BNs) structure elicitation that is based on the initialization of several LLMs with different experiences, independently querying them to create a structure of the BN, and further obtaining the final structure by majority voting. We compare the method with one alternative method on various widely and not widely known BNs of different sizes and study the scalability of both methods on them. We also propose an approach to check the contamination of BNs in LLM, which shows that some widely known BNs are inapplicable for testing the LLM usage for BNs structure elicitation. We also show that some BNs may be inapplicable for such experiments because their node names are indistinguishable. The experiments on the other BNs show that our method performs better than the existing method with one of the three studied LLMs; however, the performance of both methods significantly decreases with the increase in BN size.
Paper Structure (31 sections, 9 figures, 9 tables)

This paper contains 31 sections, 9 figures, 9 tables.

Figures (9)

  • Figure 1: A BN for the lung cancer problem. Note that conditional probability tables encoding the probabilistic relations between variables are not shown here.
  • Figure 2: The overview of the proposed method for BN structure elicitation. Facilitator LLM generates N profiles relevant to the given BN. "LLM experts" initialized with different profiles are queried about the structure of BN using two consecutive prompts. The final BN structure is obtained by majority voting between the structures elicited from individual experts.
  • Figure 3: Dynamic of mean SHD normalized by edges count related to the number of "LLM expert" profiles.
  • Figure 4: SHD normalized by edges count between structures generated by different GPT-4 "LLM-experts" for apple BNs.
  • Figure 5: F-score and SHD normalized by edges count related to the number of edges in a BN.
  • ...and 4 more figures