Interpretable Locomotion Prediction in Construction Using a Memory-Driven LLM Agent With Chain-of-Thought Reasoning
Ehsan Ahmadi, Chao Wang
TL;DR
This work tackles locomotion prediction for exoskeleton-assisted construction workers, where dynamic, safety-critical environments complicate intent recognition. It introduces a memory-augmented LLM agent that fuses spoken commands and visual data through a Perception Module, STM, LTM, and Refinement Module, enabling Chain-of-Thought reasoning to predict locomotion modes. Empirical results show substantial gains: the weighted F1-score improves from $0.73$ with no memory to $0.90$ with both STM and LTM, while calibration metrics drop from a Brier Score of $0.244$ to $0.090$ and ECE from $0.222$ to $0.044$, validating both accuracy and reliability. The framework supports safer, adaptive human-exoskeleton interaction in construction and offers a blueprint for memory-enhanced, context-aware assistive systems in dynamic industries, with the key mechanism captured by the Composite Score $= (w_s \cdot \text{similarity}) + (w_i \cdot \text{importance}) + (w_c \cdot \text{confidence}) - (w_d \cdot \text{discrepancy} + w_v \cdot \text{vagueness})$.
Abstract
Construction tasks are inherently unpredictable, with dynamic environments and safety-critical demands posing significant risks to workers. Exoskeletons offer potential assistance but falter without accurate intent recognition across diverse locomotion modes. This paper presents a locomotion prediction agent leveraging Large Language Models (LLMs) augmented with memory systems, aimed at improving exoskeleton assistance in such settings. Using multimodal inputs - spoken commands and visual data from smart glasses - the agent integrates a Perception Module, Short-Term Memory (STM), Long-Term Memory (LTM), and Refinement Module to predict locomotion modes effectively. Evaluation reveals a baseline weighted F1-score of 0.73 without memory, rising to 0.81 with STM, and reaching 0.90 with both STM and LTM, excelling with vague and safety-critical commands. Calibration metrics, including a Brier Score drop from 0.244 to 0.090 and ECE from 0.222 to 0.044, affirm improved reliability. This framework supports safer, high-level human-exoskeleton collaboration, with promise for adaptive assistive systems in dynamic industries.
