Breaking the Barrier: Enhanced Utility and Robustness in Smoothed DRL Agents

Chung-En Sun; Sicun Gao; Tsui-Wei Weng

Breaking the Barrier: Enhanced Utility and Robustness in Smoothed DRL Agents

Chung-En Sun, Sicun Gao, Tsui-Wei Weng

TL;DR

This study introduces innovative algorithms aimed at training effective smoothed robust DRL agents, S-DQN and S-PPO, novel approaches that demonstrate remarkable improvements in clean rewards, empirical robustness, and robustness guarantee across standard RL benchmarks.

Abstract

Robustness remains a paramount concern in deep reinforcement learning (DRL), with randomized smoothing emerging as a key technique for enhancing this attribute. However, a notable gap exists in the performance of current smoothed DRL agents, often characterized by significantly low clean rewards and weak robustness. In response to this challenge, our study introduces innovative algorithms aimed at training effective smoothed robust DRL agents. We propose S-DQN and S-PPO, novel approaches that demonstrate remarkable improvements in clean rewards, empirical robustness, and robustness guarantee across standard RL benchmarks. Notably, our S-DQN and S-PPO agents not only significantly outperform existing smoothed agents by an average factor of $2.16\times$ under the strongest attack, but also surpass previous robustly-trained agents by an average factor of $2.13\times$. This represents a significant leap forward in the field. Furthermore, we introduce Smoothed Attack, which is $1.89\times$ more effective in decreasing the rewards of smoothed agents than existing adversarial attacks.

Breaking the Barrier: Enhanced Utility and Robustness in Smoothed DRL Agents

TL;DR

Abstract

under the strongest attack, but also surpass previous robustly-trained agents by an average factor of

. This represents a significant leap forward in the field. Furthermore, we introduce Smoothed Attack, which is

more effective in decreasing the rewards of smoothed agents than existing adversarial attacks.

Paper Structure (57 sections, 66 equations, 6 figures, 17 tables, 5 algorithms)

This paper contains 57 sections, 66 equations, 6 figures, 17 tables, 5 algorithms.

Introduction
Failure in existing Smoothed DRL Agents
Learning Robust DRL Agents with Randomized Smoothing
S-DQN (Smoothed - Deep Q Network)
Training and loss function.
Testing with hard randomized smoothing.
New attack framework: Smoothed attack.
S-PPO (Smoothed - Proximal Policy Optimization)
Training and loss function.
Adversary training for S-PPO.
Testing.
Attack.
Robustness certification
Certified Radius for S-DQN.
Action Bound for S-PPO.
...and 42 more sections

Figures (6)

Figure 1: The clean reward and reward under attack for DQN and PPO agents. The presented reward is normalized and averaged across environments. Our S-DQN and S-PPO agents (in the Red boxes) exhibit significantly improved clean reward and robustness in comparison to the previous smoothed agents (in the Brown boxes) and the non-smoothed robust agents (in the Gray boxes).
Figure 2: The overview of our framework. We propose new DRL training algorithms leveraging Randomized Smoothing, achieving strong certifiable robustness, high clean reward, and high robust reward simultaneously.
Figure 3: The flow chart of: (a) training process of S-DQN, (b) testing process of S-DQN, (c) our Smoothed Attack pipeline for smoothed agents, which is much more effective than non-smoothed attack.
Figure 4: The training process of S-PPO.
Figure 5: The certified reward lower bound of smoothed DQN agents. Our S-DQNs achieve a much higher lower bound than all the previous smoothed agents.
...and 1 more figures

Breaking the Barrier: Enhanced Utility and Robustness in Smoothed DRL Agents

TL;DR

Abstract

Breaking the Barrier: Enhanced Utility and Robustness in Smoothed DRL Agents

Authors

TL;DR

Abstract

Table of Contents

Figures (6)