Table of Contents
Fetching ...

Bonsai: Interpretable Tree-Adaptive Grounded Reasoning

Kate Sanders, Benjamin Van Durme

TL;DR

Bonsai tackles the need for general-purpose AI that can adapt to new domains while providing transparent, uncertainty-aware reasoning grounded in multimodal data. It achieves this through a tree-structured, evidence-grounded reasoning framework that maps raw inputs to natural-language observations, assigns probabilistic leaf scores via anchoring-and-adjustment, and performs probabilistic inference with optional counterfactual reasoning and test-time evidence scaling. The approach yields interpretable sub-claim traces and grounded explanations, attaining strong performance on traditional text-based QA benchmarks (EntailmentBank, TVQA) and multimodal tasks (MultiVENT) while enabling human-in-the-loop corrections. This demonstrates the practical potential of transparent, probabilistic, multimodal reasoning for reliable, audit-friendly AI in real-world applications.

Abstract

To develop general-purpose collaborative agents, humans need reliable AI systems that can (1) adapt to new domains and (2) transparently reason with uncertainty to allow for verification and correction. Black-box models demonstrate powerful data processing abilities but do not satisfy these criteria due to their opaqueness, domain specificity, and lack of uncertainty awareness. We introduce Bonsai, a compositional and probabilistic reasoning system that generates adaptable inference trees by retrieving relevant grounding evidence and using it to compute likelihoods of sub-claims derived from broader natural language inferences. Bonsai's reasoning power is tunable at test-time via evidence scaling and it demonstrates reliable handling of varied domains including transcripts, photographs, videos, audio, and databases. Question-answering and human alignment experiments demonstrate that Bonsai matches the performance of domain-specific black-box methods while generating interpretable, grounded, and uncertainty-aware reasoning traces.

Bonsai: Interpretable Tree-Adaptive Grounded Reasoning

TL;DR

Bonsai tackles the need for general-purpose AI that can adapt to new domains while providing transparent, uncertainty-aware reasoning grounded in multimodal data. It achieves this through a tree-structured, evidence-grounded reasoning framework that maps raw inputs to natural-language observations, assigns probabilistic leaf scores via anchoring-and-adjustment, and performs probabilistic inference with optional counterfactual reasoning and test-time evidence scaling. The approach yields interpretable sub-claim traces and grounded explanations, attaining strong performance on traditional text-based QA benchmarks (EntailmentBank, TVQA) and multimodal tasks (MultiVENT) while enabling human-in-the-loop corrections. This demonstrates the practical potential of transparent, probabilistic, multimodal reasoning for reliable, audit-friendly AI in real-world applications.

Abstract

To develop general-purpose collaborative agents, humans need reliable AI systems that can (1) adapt to new domains and (2) transparently reason with uncertainty to allow for verification and correction. Black-box models demonstrate powerful data processing abilities but do not satisfy these criteria due to their opaqueness, domain specificity, and lack of uncertainty awareness. We introduce Bonsai, a compositional and probabilistic reasoning system that generates adaptable inference trees by retrieving relevant grounding evidence and using it to compute likelihoods of sub-claims derived from broader natural language inferences. Bonsai's reasoning power is tunable at test-time via evidence scaling and it demonstrates reliable handling of varied domains including transcripts, photographs, videos, audio, and databases. Question-answering and human alignment experiments demonstrate that Bonsai matches the performance of domain-specific black-box methods while generating interpretable, grounded, and uncertainty-aware reasoning traces.

Paper Structure

This paper contains 52 sections, 1 equation, 4 figures, 8 tables, 1 algorithm.

Figures (4)

  • Figure 1: A reasoning tree over a news video as a grounding source. Bonsai recursively decomposes natural language statements about data into small, verifiable pieces. It uses retrieved evidence samples from multimodal knowledge sources to iteratively score these pieces in terms of how likely each piece is. This procedure results in grounded likelihood scores for leaves of compositional tree structures representing the original claim, alongside natural language explanations. Low-scoring branches of a reasoning tree may then be pruned, shown by the strikethrough text in the original statement and sub-claims.
  • Figure 2: For most tasks, human performance varies depending on background knowledge. Two humans were given a set of candidate descriptions for the video on the left, and pictured on the right are their answers and confidence scores. Illustrated below the responses, Bonsai retrieves its top three generated video observations and uses them to score these claims, in the positive case by iteratively updating their likelihood scores and providing explanations.
  • Figure 3: Agreement of different uncertainty quantification approaches compared to human likelihood judgments on a visual classification task. Unlike basic probability scoring prompts (SQ. Prmt and Curr. Prmt), Bonsai's "anchor and adjust" evidence-focused approach (Full Scorer) outperforms traditional uncertainty quantification methods with a fine-tuned classifier (Focal Loss).
  • Figure 4: An example Bonsai output on TVQA. One of the $\leq$3 evidence pieces per leaf and a snippet of the system's probability scoring trace per leaf are shown. Conditional probabilities are shown next to their respective sub-claims, alongside what other sub-claims they are conditioned on.