Table of Contents
Fetching ...

Dual-Agent Multiple-Model Reinforcement Learning for Event-Triggered Human-Robot Co-Adaptation in Decoupled Task Spaces

Yaqi Li, Zhengqi Han, Huifang Liu, Steven W. Su

TL;DR

This paper presents a shared-control rehabilitation policy for a custom 6-degree-of-freedom (6-DoF) upper-limb robot that decomposes complex reaching tasks into decoupled spatial axes and introduces Dual Agent Multiple Model Reinforcement Learning (DAMMRL).

Abstract

This paper presents a shared-control rehabilitation policy for a custom 6-degree-of-freedom (6-DoF) upper-limb robot that decomposes complex reaching tasks into decoupled spatial axes. The patient governs the primary reaching direction using binary commands, while the robot autonomously manages orthogonal corrective motions. Because traditional fixed-frequency control often induces trajectory oscillations due to variable inverse-kinematics execution times, an event-driven progression strategy is proposed. This architecture triggers subsequent control actions only when the end-effector enters an admission sphere centred on the immediate target waypoint, and was validated in a semi-virtual setup linking a physical pressure sensor to a MuJoCo simulation. To optimise human--robot co-adaptation safely and efficiently, this study introduces Dual Agent Multiple Model Reinforcement Learning (DAMMRL). This framework discretises decision characteristics: the human agent selects the admission sphere radius to reflect their inherent speed--accuracy trade-off, while the robot agent dynamically adjusts its 3D Cartesian step magnitudes to complement the user's cognitive state. Trained in simulation and deployed across mixed environments, this event-triggered DAMMRL approach effectively suppresses waypoint chatter, balances spatial precision with temporal efficiency, and significantly improves success rates in object acquisition tasks.

Dual-Agent Multiple-Model Reinforcement Learning for Event-Triggered Human-Robot Co-Adaptation in Decoupled Task Spaces

TL;DR

This paper presents a shared-control rehabilitation policy for a custom 6-degree-of-freedom (6-DoF) upper-limb robot that decomposes complex reaching tasks into decoupled spatial axes and introduces Dual Agent Multiple Model Reinforcement Learning (DAMMRL).

Abstract

This paper presents a shared-control rehabilitation policy for a custom 6-degree-of-freedom (6-DoF) upper-limb robot that decomposes complex reaching tasks into decoupled spatial axes. The patient governs the primary reaching direction using binary commands, while the robot autonomously manages orthogonal corrective motions. Because traditional fixed-frequency control often induces trajectory oscillations due to variable inverse-kinematics execution times, an event-driven progression strategy is proposed. This architecture triggers subsequent control actions only when the end-effector enters an admission sphere centred on the immediate target waypoint, and was validated in a semi-virtual setup linking a physical pressure sensor to a MuJoCo simulation. To optimise human--robot co-adaptation safely and efficiently, this study introduces Dual Agent Multiple Model Reinforcement Learning (DAMMRL). This framework discretises decision characteristics: the human agent selects the admission sphere radius to reflect their inherent speed--accuracy trade-off, while the robot agent dynamically adjusts its 3D Cartesian step magnitudes to complement the user's cognitive state. Trained in simulation and deployed across mixed environments, this event-triggered DAMMRL approach effectively suppresses waypoint chatter, balances spatial precision with temporal efficiency, and significantly improves success rates in object acquisition tasks.
Paper Structure (26 sections, 6 equations, 5 figures)

This paper contains 26 sections, 6 equations, 5 figures.

Figures (5)

  • Figure 1: Experimental setups for the 6-DoF robotic manipulator, illustrating (a) the virtual training environment and (b) the real-world physical deployment.
  • Figure 2: Trajectory evaluation near subgoals: Fixed-Frequency baseline (black line) versus the proposed Event-Driven DAMMRL progression (red line), demonstrating a significant suppression of waypoint oscillations.
  • Figure 3: Semi-Virtual (S2) experimental validation of the axial decomposition policy and event-triggered progression.
  • Figure 4: Training curves demonstrating the convergence of both models during the DAMMRL experiment. The top row shows convergence under Reward 1 (emphasizing spatial accuracy), while the bottom row shows convergence under Reward 2 (balancing speed and accuracy).
  • Figure 5: Comparison of the learned step-size modulation strategies and planned step counts. Top: Under Reward 1, the agent selects smaller step magnitudes near the target to prioritize spatial accuracy. Bottom: Under Reward 2, the agent opts for larger step sizes to accelerate target convergence.