Table of Contents
Fetching ...

Human Machine Co-Adaptation Model and Its Convergence Analysis

Steven W. Su, Yaqi Li, Kairui Guo, Rob Duffield

TL;DR

This paper develops a two-agent cooperative MDP (CaMDP) framework to model human-machine co-adaptation in robot-assisted rehabilitation and provides sufficient conditions for convergence to a unique Nash equilibrium, as well as strategies to handle multiple equilibria. It introduces structured results for policy improvement under alternating updates, analyses the uniqueness of NE with dominance conditions, and proposes less-greedy and model-simplification techniques to improve convergence and reduce switching costs. Through extensive numerical experiments, the authors validate convergence criteria, demonstrate how large discount factors and partial observability affect outcomes, and show that epsilon-greedy updates can reach the global optimum in challenging CAMDP settings. The work advances theoretical foundations and practical algorithms for robust, cooperative reinforcement learning in rehabilitative human-machine interfaces, with implications for safety, explainability, and real-time adaptability.

Abstract

The key to robot-assisted rehabilitation lies in the design of the human-machine interface, which must accommodate the needs of both patients and machines. Current interface designs primarily focus on machine control algorithms, often requiring patients to spend considerable time adapting. In this paper, we introduce a novel approach based on the Cooperative Adaptive Markov Decision Process (CAMDPs) model to address the fundamental aspects of the interactive learning process, offering theoretical insights and practical guidance. We establish sufficient conditions for the convergence of CAMDPs and ensure the uniqueness of Nash equilibrium points. Leveraging these conditions, we guarantee the system's convergence to a unique Nash equilibrium point. Furthermore, we explore scenarios with multiple Nash equilibrium points, devising strategies to adjust both Value Evaluation and Policy Improvement algorithms to enhance the likelihood of converging to the global minimal Nash equilibrium point. Through numerical experiments, we illustrate the effectiveness of the proposed conditions and algorithms, demonstrating their applicability and robustness in practical settings. The proposed conditions for convergence and the identification of a unique optimal Nash equilibrium contribute to the development of more effective adaptive systems for human users in robot-assisted rehabilitation.

Human Machine Co-Adaptation Model and Its Convergence Analysis

TL;DR

This paper develops a two-agent cooperative MDP (CaMDP) framework to model human-machine co-adaptation in robot-assisted rehabilitation and provides sufficient conditions for convergence to a unique Nash equilibrium, as well as strategies to handle multiple equilibria. It introduces structured results for policy improvement under alternating updates, analyses the uniqueness of NE with dominance conditions, and proposes less-greedy and model-simplification techniques to improve convergence and reduce switching costs. Through extensive numerical experiments, the authors validate convergence criteria, demonstrate how large discount factors and partial observability affect outcomes, and show that epsilon-greedy updates can reach the global optimum in challenging CAMDP settings. The work advances theoretical foundations and practical algorithms for robust, cooperative reinforcement learning in rehabilitative human-machine interfaces, with implications for safety, explainability, and real-time adaptability.

Abstract

The key to robot-assisted rehabilitation lies in the design of the human-machine interface, which must accommodate the needs of both patients and machines. Current interface designs primarily focus on machine control algorithms, often requiring patients to spend considerable time adapting. In this paper, we introduce a novel approach based on the Cooperative Adaptive Markov Decision Process (CAMDPs) model to address the fundamental aspects of the interactive learning process, offering theoretical insights and practical guidance. We establish sufficient conditions for the convergence of CAMDPs and ensure the uniqueness of Nash equilibrium points. Leveraging these conditions, we guarantee the system's convergence to a unique Nash equilibrium point. Furthermore, we explore scenarios with multiple Nash equilibrium points, devising strategies to adjust both Value Evaluation and Policy Improvement algorithms to enhance the likelihood of converging to the global minimal Nash equilibrium point. Through numerical experiments, we illustrate the effectiveness of the proposed conditions and algorithms, demonstrating their applicability and robustness in practical settings. The proposed conditions for convergence and the identification of a unique optimal Nash equilibrium contribute to the development of more effective adaptive systems for human users in robot-assisted rehabilitation.

Paper Structure

This paper contains 14 sections, 5 theorems, 46 equations, 4 figures, 1 table, 1 algorithm.

Key Result

Lemma 1

guo2024cooperative For the CaMDPs model, assume the augmented probability transition matrix $\bar{P}$ is quasi-positive (i.e., irreducible and aperiodic stochastic). Then, for any two sub-control policies $\pi_0$ and $\pi_1$ (with $\pi = \{\pi_0, \pi_1\}$), if $\gamma \leq \gamma_0 < 1$, the value f where $\bar{R}$ is the augmented reward function, and $diag(A)$ represents the operation of extract

Figures (4)

  • Figure 1: The block diagram of the robot assisted rehabilitation in simulation setting.
  • Figure 2: State abstraction: compress the state into a compact representation $z(t)$ and model the transition in this latent space (See Figure 2.4 in moerland2023model).
  • Figure 3: Temporal/action abstraction: better known as hierarchical reinforcement learning, where we learn an abstract action $u(t)$ that brings $s(t)$ to $s(t+n)$. Temporal abstraction directly implies multi-step prediction, as otherwise, the abstract action $u(t)$ is equal to the low-level action $a(t)$ (See Figure 2.5 in moerland2023model).
  • Figure 4: Value vs Discount Factor $\gamma$.

Theorems & Definitions (11)

  • Definition 1
  • Lemma 1
  • Definition 2: Simultaneous Policy Update Rule
  • Definition 3: Alternating Policy Update Rule
  • Theorem 1
  • proof
  • Lemma 2
  • proof
  • Theorem 2
  • proof
  • ...and 1 more