Table of Contents
Fetching ...

Green Resilience of Cyber-Physical Systems: Doctoral Dissertation

Diaeddin Rimawi

TL;DR

This work addresses the challenge of maintaining performance in online collaborative AI systems (OL-CAIS) under exogenous disruptions while minimizing energy impact. It introduces the GResilience framework, combining one-agent optimization, two-agent game theory, and reinforcement learning to balance resilience and greenness, and validates these policies on the CORAL collaborative robot via real-world and simulated experiments. A resilience model using Autonomous Classification Ratio (ACR) tracks performance evolution across steady, disruptive, and final states, guiding automatic recovery with a measurements framework that quantifies recovery speed, steadiness, green efficiency, and autonomy. Containerization is shown to halve CO$_2$ emissions and improve resilience, while RL-agent policies offer strongest performance at higher computational and environmental cost. The work also analyzes catastrophic forgetting and proposes runtime policies to sustain steady performance over repeated disruptions, culminating in a practical toolkit (CAIS-DMA) for green, resilient OL-CAIS deployment.

Abstract

Cyber-physical systems (CPS) combine computational and physical components. Online Collaborative AI System (OL-CAIS) is a type of CPS that learn online in collaboration with humans to achieve a common goal, which makes it vulnerable to disruptive events that degrade performance. Decision-makers must therefore restore performance while limiting energy impact, creating a trade-off between resilience and greenness. This research addresses how to balance these two properties in OL-CAIS. It aims to model resilience for automatic state detection, develop agent-based policies that optimize the greenness-resilience trade-off, and understand catastrophic forgetting to maintain performance consistency. We model OL-CAIS behavior through three operational states: steady, disruptive, and final. To support recovery during disruptions, we introduce the GResilience framework, which provides recovery strategies through multi-objective optimization (one-agent), game-theoretic decision-making (two-agent), and reinforcement learning (RL-agent). We also design a measurement framework to quantify resilience and greenness. Empirical evaluation uses real and simulated experiments with a collaborative robot learning object classification from human demonstrations. Results show that the resilience model captures performance transitions during disruptions, and that GResilience policies improve green recovery by shortening recovery time, stabilizing performance, and reducing human dependency. RL-agent policies achieve the strongest results, although with a marginal increase in CO2 emissions. We also observe catastrophic forgetting after repeated disruptions, while our policies help maintain steadiness. A comparison with containerized execution shows that containerization cuts CO2 emissions by half. Overall, this research provides models, metrics, and policies that ensure the green recovery of OL-CAIS.

Green Resilience of Cyber-Physical Systems: Doctoral Dissertation

TL;DR

This work addresses the challenge of maintaining performance in online collaborative AI systems (OL-CAIS) under exogenous disruptions while minimizing energy impact. It introduces the GResilience framework, combining one-agent optimization, two-agent game theory, and reinforcement learning to balance resilience and greenness, and validates these policies on the CORAL collaborative robot via real-world and simulated experiments. A resilience model using Autonomous Classification Ratio (ACR) tracks performance evolution across steady, disruptive, and final states, guiding automatic recovery with a measurements framework that quantifies recovery speed, steadiness, green efficiency, and autonomy. Containerization is shown to halve CO emissions and improve resilience, while RL-agent policies offer strongest performance at higher computational and environmental cost. The work also analyzes catastrophic forgetting and proposes runtime policies to sustain steady performance over repeated disruptions, culminating in a practical toolkit (CAIS-DMA) for green, resilient OL-CAIS deployment.

Abstract

Cyber-physical systems (CPS) combine computational and physical components. Online Collaborative AI System (OL-CAIS) is a type of CPS that learn online in collaboration with humans to achieve a common goal, which makes it vulnerable to disruptive events that degrade performance. Decision-makers must therefore restore performance while limiting energy impact, creating a trade-off between resilience and greenness. This research addresses how to balance these two properties in OL-CAIS. It aims to model resilience for automatic state detection, develop agent-based policies that optimize the greenness-resilience trade-off, and understand catastrophic forgetting to maintain performance consistency. We model OL-CAIS behavior through three operational states: steady, disruptive, and final. To support recovery during disruptions, we introduce the GResilience framework, which provides recovery strategies through multi-objective optimization (one-agent), game-theoretic decision-making (two-agent), and reinforcement learning (RL-agent). We also design a measurement framework to quantify resilience and greenness. Empirical evaluation uses real and simulated experiments with a collaborative robot learning object classification from human demonstrations. Results show that the resilience model captures performance transitions during disruptions, and that GResilience policies improve green recovery by shortening recovery time, stabilizing performance, and reducing human dependency. RL-agent policies achieve the strongest results, although with a marginal increase in CO2 emissions. We also observe catastrophic forgetting after repeated disruptions, while our policies help maintain steadiness. A comparison with containerized execution shows that containerization cuts CO2 emissions by half. Overall, this research provides models, metrics, and policies that ensure the green recovery of OL-CAIS.

Paper Structure

This paper contains 118 sections, 10 equations, 50 figures, 11 tables, 3 algorithms.

Figures (50)

  • Figure 1: Research challenges that face industrial OL-CAIS: (A) OL-CAIS learns from human collaboration to achieve steady performance. (B) Upon an exogenous disruptive event, its performance may degrade. Thus, decision-makers must balance resilience and greenness while addressing performance degradation.
  • Figure 1: Research methodology.
  • Figure 1: Collaborative robot learning by demonstration.
  • Figure 2: Decision-making flow to learn new arriving data instances in an online learning fashion.
  • Figure 2: The detailed representation of our decision-making components.
  • ...and 45 more figures