CERNet: Class-Embedding Predictive-Coding RNN for Unified Robot Motion, Recognition, and Confidence Estimation
Hiroki Sawada, Alexandre Pitti, Mathias Quoy
TL;DR
The paper tackles the challenge of enabling robots to simultaneously generate learned motions, infer human or demonstrator intent, and estimate the system's own confidence in real time. It introduces CERNet, a multi-layer predictive-coding RNN with a dynamically updated class-embedding vector that unifies generation and recognition within a single closed-loop model, validated on a Reachy humanoid. Across 26 alphabet trajectories, CERNet achieves a 76% reduction in reproduction error versus a parameter-matched single-layer baseline, demonstrates robustness to external perturbations, and attains real-time recognition with 68% Top-1 and 81% Top-2 accuracy; importantly, internal prediction error serves as an intrinsic confidence signal. This work provides a compact, extensible approach to motor memory and intent-aware human–robot collaboration, with potential extensions to online learning and multimodal sensing.
Abstract
Robots interacting with humans must not only generate learned movements in real-time, but also infer the intent behind observed behaviors and estimate the confidence of their own inferences. This paper proposes a unified model that achieves all three capabilities within a single hierarchical predictive-coding recurrent neural network (PC-RNN) equipped with a class embedding vector, CERNet, which leverages a dynamically updated class embedding vector to unify motor generation and recognition. The model operates in two modes: generation and inference. In the generation mode, the class embedding constrains the hidden state dynamics to a class-specific subspace; in the inference mode, it is optimized online to minimize prediction error, enabling real-time recognition. Validated on a humanoid robot across 26 kinesthetically taught alphabets, our hierarchical model achieves 76% lower trajectory reproduction error than a parameter-matched single-layer baseline, maintains motion fidelity under external perturbations, and infers the demonstrated trajectory class online with 68% Top-1 and 81% Top-2 accuracy. Furthermore, internal prediction errors naturally reflect the model's confidence in its recognition. This integration of robust generation, real-time recognition, and intrinsic uncertainty estimation within a compact PC-RNN framework offers a compact and extensible approach to motor memory in physical robots, with potential applications in intent-sensitive human-robot collaboration.
