Table of Contents
Fetching ...

MINT: Minimal Information Neuro-Symbolic Tree for Objective-Driven Knowledge-Gap Reasoning and Active Elicitation

Zeyu Fang, Tian Lan, Mahdi Imani

TL;DR

MINT tackles objective-driven knowledge-gap reasoning in open-world planning by combining symbolic tree reasoning with a neural planning policy and LLM-driven query curation. It models planning under a knowledge gap as an extended MDP family $\mathcal{M}_u$, trains an uncertainty-aware Q-network to estimate means and variances over unknown descriptors, and uses self-play to expand a symbolic tree of potential human-AI interactions. Theoretical results establish a local pseudo-Lipschitz continuity and an upper bound on the return gap to an ideal, gap-free policy, while empirical results across MiniGrid, Atari Pacman, and NVIDIA Isaac demonstrate near-expert performance with substantially fewer questions. The approach advances human-AI collaboration by enabling principled, minimal-query elicitation that directly targets planning objectives and uncertainty reduction, with practical impact for robust language-assisted planning in complex environments.

Abstract

Joint planning through language-based interactions is a key area of human-AI teaming. Planning problems in the open world often involve various aspects of incomplete information and unknowns, e.g., objects involved, human goals/intents -- thus leading to knowledge gaps in joint planning. We consider the problem of discovering optimal interaction strategies for AI agents to actively elicit human inputs in object-driven planning. To this end, we propose Minimal Information Neuro-Symbolic Tree (MINT) to reason about the impact of knowledge gaps and leverage self-play with MINT to optimize the AI agent's elicitation strategies and queries. More precisely, MINT builds a symbolic tree by making propositions of possible human-AI interactions and by consulting a neural planning policy to estimate the uncertainty in planning outcomes caused by remaining knowledge gaps. Finally, we leverage LLM to search and summarize MINT's reasoning process and curate a set of queries to optimally elicit human inputs for best planning performance. By considering a family of extended Markov decision processes with knowledge gaps, we analyze the return guarantee for a given MINT with active human elicitation. Our evaluation on three benchmarks involving unseen/unknown objects of increasing realism shows that MINT-based planning attains near-expert returns by issuing a limited number of questions per task while achieving significantly improved rewards and success rates.

MINT: Minimal Information Neuro-Symbolic Tree for Objective-Driven Knowledge-Gap Reasoning and Active Elicitation

TL;DR

MINT tackles objective-driven knowledge-gap reasoning in open-world planning by combining symbolic tree reasoning with a neural planning policy and LLM-driven query curation. It models planning under a knowledge gap as an extended MDP family , trains an uncertainty-aware Q-network to estimate means and variances over unknown descriptors, and uses self-play to expand a symbolic tree of potential human-AI interactions. Theoretical results establish a local pseudo-Lipschitz continuity and an upper bound on the return gap to an ideal, gap-free policy, while empirical results across MiniGrid, Atari Pacman, and NVIDIA Isaac demonstrate near-expert performance with substantially fewer questions. The approach advances human-AI collaboration by enabling principled, minimal-query elicitation that directly targets planning objectives and uncertainty reduction, with practical impact for robust language-assisted planning in complex environments.

Abstract

Joint planning through language-based interactions is a key area of human-AI teaming. Planning problems in the open world often involve various aspects of incomplete information and unknowns, e.g., objects involved, human goals/intents -- thus leading to knowledge gaps in joint planning. We consider the problem of discovering optimal interaction strategies for AI agents to actively elicit human inputs in object-driven planning. To this end, we propose Minimal Information Neuro-Symbolic Tree (MINT) to reason about the impact of knowledge gaps and leverage self-play with MINT to optimize the AI agent's elicitation strategies and queries. More precisely, MINT builds a symbolic tree by making propositions of possible human-AI interactions and by consulting a neural planning policy to estimate the uncertainty in planning outcomes caused by remaining knowledge gaps. Finally, we leverage LLM to search and summarize MINT's reasoning process and curate a set of queries to optimally elicit human inputs for best planning performance. By considering a family of extended Markov decision processes with knowledge gaps, we analyze the return guarantee for a given MINT with active human elicitation. Our evaluation on three benchmarks involving unseen/unknown objects of increasing realism shows that MINT-based planning attains near-expert returns by issuing a limited number of questions per task while achieving significantly improved rewards and success rates.
Paper Structure (32 sections, 7 theorems, 22 equations, 3 figures, 6 tables, 2 algorithms)

This paper contains 32 sections, 7 theorems, 22 equations, 3 figures, 6 tables, 2 algorithms.

Key Result

Lemma 4.2

With $\Gamma$ defined as the Bellman Operator on any function $Q:\mathcal{S} \times \mathcal{A} \rightarrow \mathbb{R}$ as: for any two MDPs $M$ and $\bar{M}$, if function $Q$ is already bounded by $\Delta_{s,a}(M, \bar{M})$, i.e., $\vert Q_{M}(s,a) - Q_{\bar{M}}(s,a)\vert \leq \Delta_{s,a}(M, \bar{M})$, then we can guarantee:

Figures (3)

  • Figure 1: Evaluating, expanding, curating, and acting with MINT. (a) How we build and expand MINT by first consulting a trained neural planning policy as an oracle, and then utilizing the LLM to curate the queries based on MINT and elicit human responses via natural-language interactions. (b) How MINT acts in the environment. AI agent implements the identified queries in its interaction with human in joint planning. The human responses are processed to produce a reduced knowledge gap $u_K$ at last, leading to an optimal action $a$ by maximizing $Q_{\varphi}^*(s,a)$ for all descriptors $\varphi\in \Phi_{u_K}$.
  • Figure 2: Illustrations of how MINT acts in all 3 environments. (a) The agent faces unknown objects in MiniGrid and curates queries about its impact on transition; (b) The agent in Atari Pacman faces unseen targets (white) and curates queries about its impact on rewards; and (c) The agent in Isaac Search and Rescue reasons about the smoke, interacts with human, and plans its path accordingly.
  • Figure 3: Screenshots of the environments used in this paper. (a)MiniGrid (b)Atari Pacman (c-1) an overview of NVIDIA Isaac environment (c-2) an example of drone view in Isaac environment.

Theorems & Definitions (12)

  • Definition 4.1
  • Lemma 4.2: One-step Bellman bound
  • Lemma 4.3: Local pseudo-Lipschitz continuity of optimal Q-value
  • Theorem 4.4: Upper bound of return for an unknown knowledge gap
  • Lemma 1.1
  • proof
  • Lemma 1.2: One-step Bellman Bound
  • proof
  • Lemma 1.3: Local pseudo-Lipschitz continuity of Optimal Q-value
  • proof
  • ...and 2 more