MINT: Minimal Information Neuro-Symbolic Tree for Objective-Driven Knowledge-Gap Reasoning and Active Elicitation

Zeyu Fang; Tian Lan; Mahdi Imani

MINT: Minimal Information Neuro-Symbolic Tree for Objective-Driven Knowledge-Gap Reasoning and Active Elicitation

Zeyu Fang, Tian Lan, Mahdi Imani

TL;DR

MINT tackles objective-driven knowledge-gap reasoning in open-world planning by combining symbolic tree reasoning with a neural planning policy and LLM-driven query curation. It models planning under a knowledge gap as an extended MDP family $\mathcal{M}_u$, trains an uncertainty-aware Q-network to estimate means and variances over unknown descriptors, and uses self-play to expand a symbolic tree of potential human-AI interactions. Theoretical results establish a local pseudo-Lipschitz continuity and an upper bound on the return gap to an ideal, gap-free policy, while empirical results across MiniGrid, Atari Pacman, and NVIDIA Isaac demonstrate near-expert performance with substantially fewer questions. The approach advances human-AI collaboration by enabling principled, minimal-query elicitation that directly targets planning objectives and uncertainty reduction, with practical impact for robust language-assisted planning in complex environments.

Abstract

Joint planning through language-based interactions is a key area of human-AI teaming. Planning problems in the open world often involve various aspects of incomplete information and unknowns, e.g., objects involved, human goals/intents -- thus leading to knowledge gaps in joint planning. We consider the problem of discovering optimal interaction strategies for AI agents to actively elicit human inputs in object-driven planning. To this end, we propose Minimal Information Neuro-Symbolic Tree (MINT) to reason about the impact of knowledge gaps and leverage self-play with MINT to optimize the AI agent's elicitation strategies and queries. More precisely, MINT builds a symbolic tree by making propositions of possible human-AI interactions and by consulting a neural planning policy to estimate the uncertainty in planning outcomes caused by remaining knowledge gaps. Finally, we leverage LLM to search and summarize MINT's reasoning process and curate a set of queries to optimally elicit human inputs for best planning performance. By considering a family of extended Markov decision processes with knowledge gaps, we analyze the return guarantee for a given MINT with active human elicitation. Our evaluation on three benchmarks involving unseen/unknown objects of increasing realism shows that MINT-based planning attains near-expert returns by issuing a limited number of questions per task while achieving significantly improved rewards and success rates.

MINT: Minimal Information Neuro-Symbolic Tree for Objective-Driven Knowledge-Gap Reasoning and Active Elicitation

TL;DR

, trains an uncertainty-aware Q-network to estimate means and variances over unknown descriptors, and uses self-play to expand a symbolic tree of potential human-AI interactions. Theoretical results establish a local pseudo-Lipschitz continuity and an upper bound on the return gap to an ideal, gap-free policy, while empirical results across MiniGrid, Atari Pacman, and NVIDIA Isaac demonstrate near-expert performance with substantially fewer questions. The approach advances human-AI collaboration by enabling principled, minimal-query elicitation that directly targets planning objectives and uncertainty reduction, with practical impact for robust language-assisted planning in complex environments.

Abstract

Paper Structure (32 sections, 7 theorems, 22 equations, 3 figures, 6 tables, 2 algorithms)

This paper contains 32 sections, 7 theorems, 22 equations, 3 figures, 6 tables, 2 algorithms.

Introduction
Related Works
Language-Based Planning
Planning under Partial Observability.
Human-in-the-Loop RL
Neuro-Symbolic RL
Preliminaries
Our Proposed Solution Using MINT
Evaluating the Impact of Current Knowledge Gaps
Adapted DQN Training Paradigm
Reasoning and Curating Queries with MINT
Node Representation.
Evaluation and Expansion.
LLM-Based Curation and Processing.
Theoretical Analysis
...and 17 more sections

Key Result

Lemma 4.2

With $\Gamma$ defined as the Bellman Operator on any function $Q:\mathcal{S} \times \mathcal{A} \rightarrow \mathbb{R}$ as: for any two MDPs $M$ and $\bar{M}$, if function $Q$ is already bounded by $\Delta_{s,a}(M, \bar{M})$, i.e., $\vert Q_{M}(s,a) - Q_{\bar{M}}(s,a)\vert \leq \Delta_{s,a}(M, \bar{M})$, then we can guarantee:

Figures (3)

Figure 1: Evaluating, expanding, curating, and acting with MINT. (a) How we build and expand MINT by first consulting a trained neural planning policy as an oracle, and then utilizing the LLM to curate the queries based on MINT and elicit human responses via natural-language interactions. (b) How MINT acts in the environment. AI agent implements the identified queries in its interaction with human in joint planning. The human responses are processed to produce a reduced knowledge gap $u_K$ at last, leading to an optimal action $a$ by maximizing $Q_{\varphi}^*(s,a)$ for all descriptors $\varphi\in \Phi_{u_K}$.
Figure 2: Illustrations of how MINT acts in all 3 environments. (a) The agent faces unknown objects in MiniGrid and curates queries about its impact on transition; (b) The agent in Atari Pacman faces unseen targets (white) and curates queries about its impact on rewards; and (c) The agent in Isaac Search and Rescue reasons about the smoke, interacts with human, and plans its path accordingly.
Figure 3: Screenshots of the environments used in this paper. (a)MiniGrid (b)Atari Pacman (c-1) an overview of NVIDIA Isaac environment (c-2) an example of drone view in Isaac environment.

Theorems & Definitions (12)

Definition 4.1
Lemma 4.2: One-step Bellman bound
Lemma 4.3: Local pseudo-Lipschitz continuity of optimal Q-value
Theorem 4.4: Upper bound of return for an unknown knowledge gap
Lemma 1.1
proof
Lemma 1.2: One-step Bellman Bound
proof
Lemma 1.3: Local pseudo-Lipschitz continuity of Optimal Q-value
proof
...and 2 more

MINT: Minimal Information Neuro-Symbolic Tree for Objective-Driven Knowledge-Gap Reasoning and Active Elicitation

TL;DR

Abstract

MINT: Minimal Information Neuro-Symbolic Tree for Objective-Driven Knowledge-Gap Reasoning and Active Elicitation

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (3)

Theorems & Definitions (12)