A Novel Bifurcation Method for Observation Perturbation Attacks on Reinforcement Learning Agents: Load Altering Attacks on a Cyber Physical Power System

Kiernan Broda-Milian; Ranwa Al-Mallah; Hanane Dagdougui

A Novel Bifurcation Method for Observation Perturbation Attacks on Reinforcement Learning Agents: Load Altering Attacks on a Cyber Physical Power System

Kiernan Broda-Milian, Ranwa Al-Mallah, Hanane Dagdougui

TL;DR

The paper tackles adversarial observation perturbations in DRL controllers for cyber-physical power systems and introduces a novel Grouped Difference Logit (GDL) loss implemented with a bifurcation layer to boost adversarial regret under a constrained budget. It demonstrates that the bifurcation method can outperform untargeted attacks and approach or exceed the impact of optimally targeted attacks while maintaining perturbations that are harder to detect, especially in continuous-action settings. Detection via MMD and time-series analysis reveals limits to single-sample stealth detection, though aggregate statistics can reveal weaknesses, guiding budget choices. Robustness improvements through discrete action spaces and ATLA are shown to reduce attack impact, though trade-offs with clean performance depend on the threat model and application, with black-box snooping attacks still posing practical risks.

Abstract

Components of cyber physical systems, which affect real-world processes, are often exposed to the internet. Replacing conventional control methods with Deep Reinforcement Learning (DRL) in energy systems is an active area of research, as these systems become increasingly complex with the advent of renewable energy sources and the desire to improve their efficiency. Artificial Neural Networks (ANN) are vulnerable to specific perturbations of their inputs or features, called adversarial examples. These perturbations are difficult to detect when properly regularized, but have significant effects on the ANN's output. Because DRL uses ANN to map optimal actions to observations, they are similarly vulnerable to adversarial examples. This work proposes a novel attack technique for continuous control using Group Difference Logits loss with a bifurcation layer. By combining aspects of targeted and untargeted attacks, the attack significantly increases the impact compared to an untargeted attack, with drastically smaller distortions than an optimally targeted attack. We demonstrate the impacts of powerful gradient-based attacks in a realistic smart energy environment, show how the impacts change with different DRL agents and training procedures, and use statistical and time-series analysis to evaluate attacks' stealth. The results show that adversarial attacks can have significant impacts on DRL controllers, and constraining an attack's perturbations makes it difficult to detect. However, certain DRL architectures are far more robust, and robust training methods can further reduce the impact.

A Novel Bifurcation Method for Observation Perturbation Attacks on Reinforcement Learning Agents: Load Altering Attacks on a Cyber Physical Power System

TL;DR

Abstract

Paper Structure (24 sections, 2 equations, 5 figures, 5 tables, 1 algorithm)

This paper contains 24 sections, 2 equations, 5 figures, 5 tables, 1 algorithm.

Introduction
Related Work
Threat Model
Methodology
Gym Environment
Untargeted and Targeted Attacks
Bifurcation and Target Group
Grouped Difference Logit Loss
Bifurcation Layer
Continuous Action Spaces
Detection
Robust Training
Black Box Attack
Results
White Box Attacks
...and 9 more sections

Figures (5)

Figure 1: Example of an adversarial attack on a discrete actor network trained in CityLearn. The original observations and actions are represented by elements in blue, and adversarial in orange. The bars on the left represent features in an observation, and the curves on the right represent the value of each logit (which correspond to actions). The upper five logits represent different levels of charge actions, and the bottom five discharge. In this example, small changes to the original observation result in an adversarial action different from the original. But, the result is only slightly more charging than is optimal, so the impact on power consumption is limited.
Figure 2: Example of an adversarial attack on a discrete actor network trained in CityLearn, using the bifurcation method. The original observations and actions are represented by elements in blue, and adversarial in orange. The bars on the left represent features in an observation, and the curves on the right represent the value of each logit (which correspond to actions). The upper five logits represent different levels of charge actions, and the bottom five discharge. The output of the bifurcation layer is the maximum logit value for each of these groups of logits. In this example, small changes to the original observation result in a discharge action instead of the original charge action. Inducing the victim agent to reverse its (dis)charge decisions increases electricity consumption.
Figure 3: Histogram comparing the adversarial regrets of agents trained conventionally and with ATLA under various attacks. The black bar represents the difference in clean performance between the ATLA and non-ATLA agent, which indicates instances where the regret is smaller for the ATLA agent, but its reduced clean performance means the non-ATLA agent still performs better for that KPI. While the ATLA agent's adversarial regret is smallest in all cases, it still consumes more energy than the conventionally trained agent when attacked with untargeted adversarial examples. The stealthy attack is the bifurcated PGD attack, with $\epsilon=0.03$ masked temporal and solar generation features, and scaled $\epsilon$ for net electricity consumption.
Figure 4: Comparison of electricity consumption KPI for direct and bifurcated FGM snooping attacks with a range of $\epsilon$, for discrete PPO agents trained conventionally and with the ALTA method. The line symbols indicate the agent, while the colour indicates the attack. This plots shows the trend between adversarial budget and regret. Solid lines represent ATLA training. While the robustness offered by ATLA is insignificant compared to its reduction in clean performance for energy consumption, the corresponding adversarial regret for bifurcated attacks is significantly reduced.
Figure 5: Comparison electricity consumption KPI for bifurcated FGM snooping attacks with a range of $\epsilon$, for discrete and continuous PPO, and SAC agents. This figure compares the trend of adversarial budget and regret between various DRL algorithms and action spaces. This figure demonstrates that the discrete PPO is significantly more robust than either agent with a continuous action space, even without ATLA training.

A Novel Bifurcation Method for Observation Perturbation Attacks on Reinforcement Learning Agents: Load Altering Attacks on a Cyber Physical Power System

TL;DR

Abstract

A Novel Bifurcation Method for Observation Perturbation Attacks on Reinforcement Learning Agents: Load Altering Attacks on a Cyber Physical Power System

Authors

TL;DR

Abstract

Table of Contents

Figures (5)