Human Machine Co-Adaptation Model and Its Convergence Analysis
Steven W. Su, Yaqi Li, Kairui Guo, Rob Duffield
TL;DR
This paper develops a two-agent cooperative MDP (CaMDP) framework to model human-machine co-adaptation in robot-assisted rehabilitation and provides sufficient conditions for convergence to a unique Nash equilibrium, as well as strategies to handle multiple equilibria. It introduces structured results for policy improvement under alternating updates, analyses the uniqueness of NE with dominance conditions, and proposes less-greedy and model-simplification techniques to improve convergence and reduce switching costs. Through extensive numerical experiments, the authors validate convergence criteria, demonstrate how large discount factors and partial observability affect outcomes, and show that epsilon-greedy updates can reach the global optimum in challenging CAMDP settings. The work advances theoretical foundations and practical algorithms for robust, cooperative reinforcement learning in rehabilitative human-machine interfaces, with implications for safety, explainability, and real-time adaptability.
Abstract
The key to robot-assisted rehabilitation lies in the design of the human-machine interface, which must accommodate the needs of both patients and machines. Current interface designs primarily focus on machine control algorithms, often requiring patients to spend considerable time adapting. In this paper, we introduce a novel approach based on the Cooperative Adaptive Markov Decision Process (CAMDPs) model to address the fundamental aspects of the interactive learning process, offering theoretical insights and practical guidance. We establish sufficient conditions for the convergence of CAMDPs and ensure the uniqueness of Nash equilibrium points. Leveraging these conditions, we guarantee the system's convergence to a unique Nash equilibrium point. Furthermore, we explore scenarios with multiple Nash equilibrium points, devising strategies to adjust both Value Evaluation and Policy Improvement algorithms to enhance the likelihood of converging to the global minimal Nash equilibrium point. Through numerical experiments, we illustrate the effectiveness of the proposed conditions and algorithms, demonstrating their applicability and robustness in practical settings. The proposed conditions for convergence and the identification of a unique optimal Nash equilibrium contribute to the development of more effective adaptive systems for human users in robot-assisted rehabilitation.
