Table of Contents
Fetching ...

Discovering and Learning Probabilistic Models of Black-Box AI Capabilities

Daniel Bramblett, Rushang Karia, Adrian Ciotinga, Ruthvick Suresh, Pulkit Verma, YooJung Choi, Siddharth Srivastava

TL;DR

The paper tackles the challenge of safely operating black-box AI systems by learning interpretable, probabilistic capability models that describe what intents a BBAI can achieve, under which conditions, and with what likelihood of outcomes. It introduces Probabilistic Capability Model Learning (PCML), an active-learning framework that uses MCTS to synthesize informative queries, and constructs pessimistic/optimistic models to bound and refine capabilities. The authors provide formal guarantees (soundness, completeness, convergence) and demonstrate empirical efficacy across diverse agents and environments, showing that PCML can reveal surprising limits and side effects in BBAIs while enabling safer deployment. The work contributes a scalable approach to learning user-centric, high-level representations of BBAI behavior that can guide deployment, design, and safety verification.

Abstract

Black-box AI (BBAI) systems such as foundational models are increasingly being used for sequential decision making. To ensure that such systems are safe to operate and deploy, it is imperative to develop efficient methods that can provide a sound and interpretable representation of the BBAI's capabilities. This paper shows that PDDL-style representations can be used to efficiently learn and model an input BBAI's planning capabilities. It uses the Monte-Carlo tree search paradigm to systematically create test tasks, acquire data, and prune the hypothesis space of possible symbolic models. Learned models describe a BBAI's capabilities, the conditions under which they can be executed, and the possible outcomes of executing them along with their associated probabilities. Theoretical results show soundness, completeness and convergence of the learned models. Empirical results with multiple BBAI systems illustrate the scope, efficiency, and accuracy of the presented methods.

Discovering and Learning Probabilistic Models of Black-Box AI Capabilities

TL;DR

The paper tackles the challenge of safely operating black-box AI systems by learning interpretable, probabilistic capability models that describe what intents a BBAI can achieve, under which conditions, and with what likelihood of outcomes. It introduces Probabilistic Capability Model Learning (PCML), an active-learning framework that uses MCTS to synthesize informative queries, and constructs pessimistic/optimistic models to bound and refine capabilities. The authors provide formal guarantees (soundness, completeness, convergence) and demonstrate empirical efficacy across diverse agents and environments, showing that PCML can reveal surprising limits and side effects in BBAIs while enabling safer deployment. The work contributes a scalable approach to learning user-centric, high-level representations of BBAI behavior that can guide deployment, design, and safety verification.

Abstract

Black-box AI (BBAI) systems such as foundational models are increasingly being used for sequential decision making. To ensure that such systems are safe to operate and deploy, it is imperative to develop efficient methods that can provide a sound and interpretable representation of the BBAI's capabilities. This paper shows that PDDL-style representations can be used to efficiently learn and model an input BBAI's planning capabilities. It uses the Monte-Carlo tree search paradigm to systematically create test tasks, acquire data, and prune the hypothesis space of possible symbolic models. Learned models describe a BBAI's capabilities, the conditions under which they can be executed, and the possible outcomes of executing them along with their associated probabilities. Theoretical results show soundness, completeness and convergence of the learned models. Empirical results with multiple BBAI systems illustrate the scope, efficiency, and accuracy of the presented methods.

Paper Structure

This paper contains 46 sections, 5 theorems, 1 equation, 5 figures, 1 table, 2 algorithms.

Key Result

Theorem 1

Let $C$ be the set of discovered capabilities, $\mathcal{D}$ the observed capability-transitions, and let $\mathcal{M}_{pess}$ and $\mathcal{M}_{opt}$ be the pessimistic and optimistic models computed by PCML. For every capability $c \in C$, $\mathcal{M}_{opt}$ is sound and $\mathcal{M}_{pess}$ is s

Figures (5)

  • Figure 1: Example capability $c_2$ with a conditional effect.
  • Figure 2: Overview of the MCTS paradigm for query synthesis. $\rho^i_j$ denotes $\tau^{M_i}_j(\rho_0)$; $\cap()$ and $\Delta()$ represent the intersection and the symmetric difference of sets of support (SoS) of their input distributions, respectively. (a) shows an MCTS formulation where each node represents tuples of next state distributions in the Distinguishing MDP (Def.\ref{['def:dist_mdp']}); (b) shows a SoS based representation that replaces each pair of outcome distributions into two sets of states: those in the intersection of their supports, and those in the symmetric difference. States in the symmetric difference are already distinguished and need not be tracked; (c) shows a sample-based approximation of the SoS representation, where each node represents a sample from two distributions' support sets. Tables on the right show outcome probabilities under the two models.
  • Figure 3:
  • Figure 6: The sampled variational distance for evaluating the PCML-E and PCML-S on four evaluation problems. The shaded region is one standard deviation error from multiple runs. The model used from each run is the pessimistic model. In first responders, the random policy agent has a VD higher than 0.6.
  • Figure 7: The sampled variational distance for evaluating the PCML-E and PCML-S on Tireworld, Rendered Blocksworld, and Probabilistic Elevators. The shaded region is one standard deviation error from multiple runs. The model used from each run is the pessimistic model.

Theorems & Definitions (14)

  • Definition 1: Black-Box AI System
  • Definition 2: Capability
  • Definition 3: Dataset
  • Definition 4: Soundness
  • Definition 5: Completeness
  • Definition 6: Pessimistic Conditions
  • Definition 7: Optimistic Conditions
  • Definition 8: Query
  • Definition 9: Distinguishing MDP
  • Theorem 1
  • ...and 4 more