Sensorimotor Attention and Language-based Regressions in Shared Latent Variables for Integrating Robot Motion Learning and LLM

Kanata Suzuki; Tetsuya Ogata

Sensorimotor Attention and Language-based Regressions in Shared Latent Variables for Integrating Robot Motion Learning and LLM

Kanata Suzuki, Tetsuya Ogata

TL;DR

This work tackles the challenge of online adaptation when grounding language instructions to robot motion by linking a motion-learning model (SATrRNN) with a language model (RWKV) through a shared latent variable (SLV). During training, the system learns to map language and sensorimotor signals into a common latent space, while a regression phase updates SLV online based on prediction errors from sensor attention and language predictions, enabling adaptive motion generation without updating the LLM weights. Empirical results in a Robosuite Panda setup for Lift, Roll, and Stack tasks show strong position generalization and language generalization when error regression is applied, with substantial gains in success rates compared to no-regression baselines. The work also analyzes internal representations, revealing how SLV trajectories organize by task and how attention mechanisms evolve to align with motion goals, supporting the grounding capability of the approach. This method improves data efficiency and offers a path toward end-to-end, feedback-driven grounding of language to robot control in realistic settings.

Abstract

In recent years, studies have been actively conducted on combining large language models (LLM) and robotics; however, most have not considered end-to-end feedback in the robot-motion generation phase. The prediction of deep neural networks must contain errors, it is required to update the trained model to correspond to the real environment to generate robot motion adaptively. This study proposes an integration method that connects the robot-motion learning model and LLM using shared latent variables. When generating robot motion, the proposed method updates shared parameters based on prediction errors from both sensorimotor attention points and task language instructions given to the robot. This allows the model to search for latent parameters appropriate for the robot task efficiently. Through simulator experiments on multiple robot tasks, we demonstrated the effectiveness of our proposed method from two perspectives: position generalization and language instruction generalization abilities.

Sensorimotor Attention and Language-based Regressions in Shared Latent Variables for Integrating Robot Motion Learning and LLM

TL;DR

Abstract

Paper Structure (20 sections, 2 equations, 8 figures, 2 tables)

This paper contains 20 sections, 2 equations, 8 figures, 2 tables.

INTRODUCTION
RELATED WORK
PROPOSED METHOD
Model Architecture
SATrRNN
RWKV
Shared Latent Variables
Training and Regression Phases
EXPERIMENTS
Dataset
Robot Task
Training
Evaluation Metrics
RESULTS AND DISCUSSION
Task Performance
...and 5 more sections

Figures (8)

Figure 1: Overview of this study. In the proposed method, latent variables related to robot tasks are updated based on prediction errors for instruction sentences and sensorimotor attention.
Figure 2: Overview of the proposed method, consisting of three modules: SATrRNN with mask predictor, RWKV, and shared latent variables.
Figure 3: Overview of the proposed error regression method. The SLV is optimized from the reconstruction error for language instruction and MSE between extracted attention points (blue circle marks) and predicted attention points (red cross marks).
Figure 4: Robot task setup in our experiments.
Figure 5: Examples of generated Lift, Roll, and Stack task sequences in case 2 (test position).
...and 3 more figures

Sensorimotor Attention and Language-based Regressions in Shared Latent Variables for Integrating Robot Motion Learning and LLM

TL;DR

Abstract

Sensorimotor Attention and Language-based Regressions in Shared Latent Variables for Integrating Robot Motion Learning and LLM

Authors

TL;DR

Abstract

Table of Contents

Figures (8)