Table of Contents
Fetching ...

Interpretable Locomotion Prediction in Construction Using a Memory-Driven LLM Agent With Chain-of-Thought Reasoning

Ehsan Ahmadi, Chao Wang

TL;DR

This work tackles locomotion prediction for exoskeleton-assisted construction workers, where dynamic, safety-critical environments complicate intent recognition. It introduces a memory-augmented LLM agent that fuses spoken commands and visual data through a Perception Module, STM, LTM, and Refinement Module, enabling Chain-of-Thought reasoning to predict locomotion modes. Empirical results show substantial gains: the weighted F1-score improves from $0.73$ with no memory to $0.90$ with both STM and LTM, while calibration metrics drop from a Brier Score of $0.244$ to $0.090$ and ECE from $0.222$ to $0.044$, validating both accuracy and reliability. The framework supports safer, adaptive human-exoskeleton interaction in construction and offers a blueprint for memory-enhanced, context-aware assistive systems in dynamic industries, with the key mechanism captured by the Composite Score $= (w_s \cdot \text{similarity}) + (w_i \cdot \text{importance}) + (w_c \cdot \text{confidence}) - (w_d \cdot \text{discrepancy} + w_v \cdot \text{vagueness})$.

Abstract

Construction tasks are inherently unpredictable, with dynamic environments and safety-critical demands posing significant risks to workers. Exoskeletons offer potential assistance but falter without accurate intent recognition across diverse locomotion modes. This paper presents a locomotion prediction agent leveraging Large Language Models (LLMs) augmented with memory systems, aimed at improving exoskeleton assistance in such settings. Using multimodal inputs - spoken commands and visual data from smart glasses - the agent integrates a Perception Module, Short-Term Memory (STM), Long-Term Memory (LTM), and Refinement Module to predict locomotion modes effectively. Evaluation reveals a baseline weighted F1-score of 0.73 without memory, rising to 0.81 with STM, and reaching 0.90 with both STM and LTM, excelling with vague and safety-critical commands. Calibration metrics, including a Brier Score drop from 0.244 to 0.090 and ECE from 0.222 to 0.044, affirm improved reliability. This framework supports safer, high-level human-exoskeleton collaboration, with promise for adaptive assistive systems in dynamic industries.

Interpretable Locomotion Prediction in Construction Using a Memory-Driven LLM Agent With Chain-of-Thought Reasoning

TL;DR

This work tackles locomotion prediction for exoskeleton-assisted construction workers, where dynamic, safety-critical environments complicate intent recognition. It introduces a memory-augmented LLM agent that fuses spoken commands and visual data through a Perception Module, STM, LTM, and Refinement Module, enabling Chain-of-Thought reasoning to predict locomotion modes. Empirical results show substantial gains: the weighted F1-score improves from with no memory to with both STM and LTM, while calibration metrics drop from a Brier Score of to and ECE from to , validating both accuracy and reliability. The framework supports safer, adaptive human-exoskeleton interaction in construction and offers a blueprint for memory-enhanced, context-aware assistive systems in dynamic industries, with the key mechanism captured by the Composite Score .

Abstract

Construction tasks are inherently unpredictable, with dynamic environments and safety-critical demands posing significant risks to workers. Exoskeletons offer potential assistance but falter without accurate intent recognition across diverse locomotion modes. This paper presents a locomotion prediction agent leveraging Large Language Models (LLMs) augmented with memory systems, aimed at improving exoskeleton assistance in such settings. Using multimodal inputs - spoken commands and visual data from smart glasses - the agent integrates a Perception Module, Short-Term Memory (STM), Long-Term Memory (LTM), and Refinement Module to predict locomotion modes effectively. Evaluation reveals a baseline weighted F1-score of 0.73 without memory, rising to 0.81 with STM, and reaching 0.90 with both STM and LTM, excelling with vague and safety-critical commands. Calibration metrics, including a Brier Score drop from 0.244 to 0.090 and ECE from 0.222 to 0.044, affirm improved reliability. This framework supports safer, high-level human-exoskeleton collaboration, with promise for adaptive assistive systems in dynamic industries.

Paper Structure

This paper contains 22 sections, 9 figures, 6 tables.

Figures (9)

  • Figure 1: High-Level Overview of the Agent’s Workflow
  • Figure 2: Perception Prompt Used by the Perception Module
  • Figure 3: Examples of FOV frames
  • Figure 4: Distribution of Command Types in the Dataset
  • Figure 5: Weighted F1-Score by Command Type
  • ...and 4 more figures