Table of Contents
Fetching ...

Adaptive Rational Activations to Boost Deep Reinforcement Learning

Quentin Delfosse, Patrick Schramowski, Martin Mundt, Alejandro Molina, Kristian Kersting

TL;DR

It is demonstrated that equipping popular algorithms with (recurrent-)rational activations leads to consistent improvements on Atari games, especially turning simple DQN into a solid approach, competitive to DDQN and Rainbow.

Abstract

Latest insights from biology show that intelligence not only emerges from the connections between neurons but that individual neurons shoulder more computational responsibility than previously anticipated. This perspective should be critical in the context of constantly changing distinct reinforcement learning environments, yet current approaches still primarily employ static activation functions. In this work, we motivate why rationals are suitable for adaptable activation functions and why their inclusion into neural networks is crucial. Inspired by recurrence in residual networks, we derive a condition under which rational units are closed under residual connections and formulate a naturally regularised version: the recurrent-rational. We demonstrate that equipping popular algorithms with (recurrent-)rational activations leads to consistent improvements on Atari games, especially turning simple DQN into a solid approach, competitive to DDQN and Rainbow.

Adaptive Rational Activations to Boost Deep Reinforcement Learning

TL;DR

It is demonstrated that equipping popular algorithms with (recurrent-)rational activations leads to consistent improvements on Atari games, especially turning simple DQN into a solid approach, competitive to DDQN and Rainbow.

Abstract

Latest insights from biology show that intelligence not only emerges from the connections between neurons but that individual neurons shoulder more computational responsibility than previously anticipated. This perspective should be critical in the context of constantly changing distinct reinforcement learning environments, yet current approaches still primarily employ static activation functions. In this work, we motivate why rationals are suitable for adaptable activation functions and why their inclusion into neural networks is crucial. Inspired by recurrence in residual networks, we derive a condition under which rational units are closed under residual connections and formulate a naturally regularised version: the recurrent-rational. We demonstrate that equipping popular algorithms with (recurrent-)rational activations leads to consistent improvements on Atari games, especially turning simple DQN into a solid approach, competitive to DDQN and Rainbow.

Paper Structure

This paper contains 30 sections, 5 equations, 12 figures, 6 tables.

Figures (12)

  • Figure 1: Neural plasticity due to trainable activation functions allows RL agents to adapt to environments of increasing complexity. Rational activations (bottom), with shared parameters in each of the last two layers, evolve together with their input distributions (shaded blue) when learning with DQN on Time Pilot. Each column corresponds to a training state where a new, more challenging part of the environment (top, e.g. increasing enemy speed and movement complexity) has been uncovered and is additionally used for training.
  • Figure 2: Neural plasticity is essential for reinforcement learning. Human normalised mean scores for rigid (LReLU and CRELU) DQN agents, agents with non-rational, rational, tempered, and regularised plasticity are shown with standard deviation across 5 random seeded experimental repetitions. Larger scores are better. Tempered plasticity, allowing initial adaptation to the environments, but not their transformations in experimental repetitions, performs better on stationary environments. Regularised plasticity performs well across all environment types. Best viewed in colour. A description of the environments' types is provided in Appendix \ref{['app:env_classification']}.
  • Figure 3: Learnable functions' plasticity boosts RL agents. For reliable evaluation, we report the performance profiles (top left) as well as superhuman probabilities (with CIs, bottom left) of baselines (i.e. DQN and DDQN with Leaky ReLU, DQN with SiLU and SiLU + dSiLU), as well as DQN with plasticity: using PELU, rational and joint-rational ($5$ random seeds). While the learnable PELU already augment performances of its agents, rational and joint-rational ones lift them above human performances on more than 70% of our runs. Detailed score tables are provided in Appendix \ref{['app:all_scores_tables']}.
  • Figure 4: Networks with rational (Rat.) and regularised (Reg.) rational plasticity compared to rigid baselines (DQN, DDQN and Rainbow) over five random seeded runs on eight Atari 2600 games. The resulting mean scores (lines) and standard deviation (transparent area) during training are shown. As one can see, DDQN does not resolve performance drops but only delays them (e.g. particularly pronounced on Seaquest). A figure including the evolution of every agent on all Atari 2600 games is provided in Appendix \ref{['app:score_evo_complete']}. Figure best viewed in colour.
  • Figure 5: Plasticity naturally reduces overestimation. Relative overestimation values ($\downarrow$, log scale) of rigid DQN and DDQN, as well as DQN with rational and regularised rational plasticity. Each trained agent is evaluated on 100 completed games (5 seeds per game per agent). Agents with rational plasticity lower overestimation values as much or further than rigid DDQN ones, which has specifically been introduced to this end. Figure best viewed in colour.
  • ...and 7 more figures