Table of Contents
Fetching ...

A Signal Contract for Online Language Grounding and Discovery in Decision-Making

Dimitris Panagopoulos, Adolfo Perrusquia, Weisi Guo

TL;DR

This work addresses online language grounding where messy, evolving verbal reports are converted into control-relevant signals during execution through an interface that localises language updates while keeping downstream decision-makers language-agnostic.

Abstract

Autonomous systems increasingly receive time-sensitive contextual updates from humans through natural language, yet embedding language understanding inside decision-makers couples grounding to learning or planning. This increases redeployment burden when language conventions or domain knowledge change and can hinder diagnosability by confounding grounding errors with control errors. We address online language grounding where messy, evolving verbal reports are converted into control-relevant signals during execution through an interface that localises language updates while keeping downstream decision-makers language-agnostic. We propose LUCIFER (Language Understanding and Context-Infused Framework for Exploration and Behavior Refinement), an inference-only middleware that exposes a Signal Contract. The contract provides four outputs, policy priors, reward potentials, admissible-option constraints, and telemetry-based action prediction for efficient information gathering. We validate LUCIFER in a search-and-rescue (SAR)-inspired testbed using dual-phase, dual-client evaluation: (i) component benchmarks show reasoning-based extraction remains robust on self-correcting reports where pattern-matching baselines degrade, and (ii) system-level ablations with two structurally distinct clients (hierarchical RL and a hybrid A*+heuristics planner) show consistent necessity and synergy. Grounding improves safety, discovery improves information-collection efficiency, and only their combination achieves both.

A Signal Contract for Online Language Grounding and Discovery in Decision-Making

TL;DR

This work addresses online language grounding where messy, evolving verbal reports are converted into control-relevant signals during execution through an interface that localises language updates while keeping downstream decision-makers language-agnostic.

Abstract

Autonomous systems increasingly receive time-sensitive contextual updates from humans through natural language, yet embedding language understanding inside decision-makers couples grounding to learning or planning. This increases redeployment burden when language conventions or domain knowledge change and can hinder diagnosability by confounding grounding errors with control errors. We address online language grounding where messy, evolving verbal reports are converted into control-relevant signals during execution through an interface that localises language updates while keeping downstream decision-makers language-agnostic. We propose LUCIFER (Language Understanding and Context-Infused Framework for Exploration and Behavior Refinement), an inference-only middleware that exposes a Signal Contract. The contract provides four outputs, policy priors, reward potentials, admissible-option constraints, and telemetry-based action prediction for efficient information gathering. We validate LUCIFER in a search-and-rescue (SAR)-inspired testbed using dual-phase, dual-client evaluation: (i) component benchmarks show reasoning-based extraction remains robust on self-correcting reports where pattern-matching baselines degrade, and (ii) system-level ablations with two structurally distinct clients (hierarchical RL and a hybrid A*+heuristics planner) show consistent necessity and synergy. Grounding improves safety, discovery improves information-collection efficiency, and only their combination achieves both.

Paper Structure

This paper contains 60 sections, 4 equations, 4 figures, 4 tables, 2 algorithms.

Figures (4)

  • Figure 1: The LUCIFER Architecture. Middleware services independently produce safety- and efficiency-relevant outputs and expose them only through a Signal Contract: policy priors, reward potentials, admissible-action constraints, and action prediction. Downstream clients consume these abstract signals using native mechanisms (e.g., action filtering, reward shaping), remaining language-agnostic and decoupled from middleware internals.
  • Figure 2: Exploration Facilitator (telemetry-based discovery). The facilitator constructs a prompt from client-agnostic telemetry: current decision context ($x_t$), episodic trace ($\xi$), and cross-episodic telemetry memory ($\mathcal{D}$). An LLM performs zero-shot reasoning to propose an advisory query option ($u^\star$) likely to yield high-value information.
  • Figure 3: Robustness comparison. Baseline parsers fail on ambiguous inputs with self-corrections (left), incorrectly treating retracted entities (e.g., the bank) as valid. In contrast, LUCIFER's grounding service (right) uses semantic reasoning to isolate the true target (the bakery) and produce correct grounded constraints.
  • Figure 4: SAR Mission task Decomposition and LUCIFER Intervention: The diagram illustrates the precise operational flow. Phase 1: $w_{TN}$ navigates to discovery points (no LUCIFER). Phase 2: At discovery point, SDE activates $w_{TI}$. Exploration Facilitator predicts the optimal query action $u^*$. $w_{TI}$ executes $u^*$, receiving verbal input. Context Extractor immediately processes the input, generating shaping signals through the contract that refine subsequent navigation by $w_{TN}$ (avoiding hazards and/or go through safe zones) and eventual triage by $w_{TT}$.