Bonsai: Interpretable Tree-Adaptive Grounded Reasoning
Kate Sanders, Benjamin Van Durme
TL;DR
Bonsai tackles the need for general-purpose AI that can adapt to new domains while providing transparent, uncertainty-aware reasoning grounded in multimodal data. It achieves this through a tree-structured, evidence-grounded reasoning framework that maps raw inputs to natural-language observations, assigns probabilistic leaf scores via anchoring-and-adjustment, and performs probabilistic inference with optional counterfactual reasoning and test-time evidence scaling. The approach yields interpretable sub-claim traces and grounded explanations, attaining strong performance on traditional text-based QA benchmarks (EntailmentBank, TVQA) and multimodal tasks (MultiVENT) while enabling human-in-the-loop corrections. This demonstrates the practical potential of transparent, probabilistic, multimodal reasoning for reliable, audit-friendly AI in real-world applications.
Abstract
To develop general-purpose collaborative agents, humans need reliable AI systems that can (1) adapt to new domains and (2) transparently reason with uncertainty to allow for verification and correction. Black-box models demonstrate powerful data processing abilities but do not satisfy these criteria due to their opaqueness, domain specificity, and lack of uncertainty awareness. We introduce Bonsai, a compositional and probabilistic reasoning system that generates adaptable inference trees by retrieving relevant grounding evidence and using it to compute likelihoods of sub-claims derived from broader natural language inferences. Bonsai's reasoning power is tunable at test-time via evidence scaling and it demonstrates reliable handling of varied domains including transcripts, photographs, videos, audio, and databases. Question-answering and human alignment experiments demonstrate that Bonsai matches the performance of domain-specific black-box methods while generating interpretable, grounded, and uncertainty-aware reasoning traces.
