Foundation Models for Logistics: Toward Certifiable, Conversational Planning Interfaces
Yunhao Yang, Neel P. Bhatt, Christian Ellis, Samuel Li, Alvaro Velasquez, Zhangyang Wang, Ufuk Topcu
TL;DR
The paper tackles the challenge of safe, interpretable logistics planning under uncertainty by introducing Vision-Language Logistics (VLL) agents that couple natural-language dialogue with real-time perceptual grounding and formal verification. A key innovation is the uncertainty-aware intent verification loop, which provides a probabilistic guarantee $\\hat{p}(y_t|z_t)$ based on latent-space distances $d_t$ to class centroids and calibration using $F_C$, enabling proactive clarifications when needed. The authors develop a three-stage VLL architecture, including perception, grounding to $r_t$ in $PDDL$, and a symbolic verifier, plus uncertainty-guided refinement using Direct Preference Optimization (DPO) and TextGrad prompts. In a lightweight airlift domain, a backbone model trained on as few as 100 samples, with calibration and refinement, outperforms a 20× larger model in goal classification while halving inference latency, illustrating that structured uncertainty signals and verification can deliver certifiable, user-aligned decisions at operational tempo.
Abstract
Logistics operators, from battlefield coordinators re-routing airlifts ahead of a storm to warehouse managers juggling late trucks, need to make mission-critical decisions. Prevailing methods for logistics planning such as integer programming yield plans that satisfy user-defined logical constraints, assuming an idealized mathematical model of the environment. On the other hand, foundation models lower the intermediate processing barrier by translating natural-language user utterances into executable plans, yet they remain prone to misinterpretations and hallucinations that jeopardize safety and cost. We introduce a Vision-Language Logistics (VLL) agent, built on a neurosymbolic framework that pairs the accessibility of natural-language dialogue with verifiable guarantees on user-objective interpretation. The agent interprets user requests and converts them into structured planning specifications, quantifies the uncertainty of the interpretation, and invokes an interactive clarification loop when the uncertainty exceeds an adaptive threshold. Drawing on a lightweight airlift logistics planning use case as an illustrative case study, we highlight a practical path toward certifiable and user-aligned decision-making for complex logistics. Our lightweight model, fine-tuned on just 100 training samples, surpasses the zero-shot performance of 20x larger models in logistic planning tasks while cutting inference latency by nearly 50%.
