Table of Contents
Fetching ...

Hadamax Encoding: Elevating Performance in Model-Free Atari

Jacob E. Kooi, Zhao Yang, Vincent François-Lavet

TL;DR

This work tackles limitations in pixel-based model-free reinforcement learning by introducing the Hadamax encoder, which combines max-pooling down-sampling, Hadamard representations across parallel hidden layers, and GELU activations within the PQN framework. The approach yields state-of-the-art Atari-57 results without any hyperparameter changes and speeds training relative to baselines, while also transferring improvements to other algorithms like C51 and DQN/Rainbow. The authors demonstrate that Hadamax increases deeper-layer effective rank and maintains a low proportion of dead neurons, supporting stable, high-capacity representations. Overall, Hadamax provides a strong architectural default for model-free Atari agents and motivates future exploration of scalable, Hadamard-based encoders and MoE extensions.

Abstract

Neural network architectures have a large impact in machine learning. In reinforcement learning, network architectures have remained notably simple, as changes often lead to small gains in performance. This work introduces a novel encoder architecture for pixel-based model-free reinforcement learning. The Hadamax (\textbf{Hada}mard \textbf{max}-pooling) encoder achieves state-of-the-art performance by max-pooling Hadamard products between GELU-activated parallel hidden layers. Based on the recent PQN algorithm, the Hadamax encoder achieves state-of-the-art model-free performance in the Atari-57 benchmark. Specifically, without applying any algorithmic hyperparameter modifications, Hadamax-PQN achieves an 80\% performance gain over vanilla PQN and significantly surpasses Rainbow-DQN. For reproducibility, the full code is available on \href{https://github.com/Jacobkooi/Hadamax}{GitHub}.

Hadamax Encoding: Elevating Performance in Model-Free Atari

TL;DR

This work tackles limitations in pixel-based model-free reinforcement learning by introducing the Hadamax encoder, which combines max-pooling down-sampling, Hadamard representations across parallel hidden layers, and GELU activations within the PQN framework. The approach yields state-of-the-art Atari-57 results without any hyperparameter changes and speeds training relative to baselines, while also transferring improvements to other algorithms like C51 and DQN/Rainbow. The authors demonstrate that Hadamax increases deeper-layer effective rank and maintains a low proportion of dead neurons, supporting stable, high-capacity representations. Overall, Hadamax provides a strong architectural default for model-free Atari agents and motivates future exploration of scalable, Hadamard-based encoders and MoE extensions.

Abstract

Neural network architectures have a large impact in machine learning. In reinforcement learning, network architectures have remained notably simple, as changes often lead to small gains in performance. This work introduces a novel encoder architecture for pixel-based model-free reinforcement learning. The Hadamax (\textbf{Hada}mard \textbf{max}-pooling) encoder achieves state-of-the-art performance by max-pooling Hadamard products between GELU-activated parallel hidden layers. Based on the recent PQN algorithm, the Hadamax encoder achieves state-of-the-art model-free performance in the Atari-57 benchmark. Specifically, without applying any algorithmic hyperparameter modifications, Hadamax-PQN achieves an 80\% performance gain over vanilla PQN and significantly surpasses Rainbow-DQN. For reproducibility, the full code is available on \href{https://github.com/Jacobkooi/Hadamax}{GitHub}.

Paper Structure

This paper contains 32 sections, 8 equations, 14 figures, 2 tables.

Figures (14)

  • Figure 1: Performance versus GPU hours in the full Atari-57 domain at 200M environment frames. The application of our Hadamard max-pooling encoder on PQN yields significant performance improvements over a current state-of-the-art model-free method, Rainbow, while remaining more than an order of magnitude faster.
  • Figure 2: Encoder architectures of DQN, PQN , the proposed Hadamard max-pooling (Hadamax) encoder and the Impala ResNet-15 encoder (from left to right). In the Hadamax encoder, down-sampling is facilitated by max-pooling operators. Furthermore, we apply a Hadamard product between parallel representation layers. The implementation is straightforward and can be found in Appendix \ref{['app:Hadamaxcode']}. These changes allow for a substantial increase in algorithm performance, while keeping general encoder structure, convolutional depth and algorithmic hyperparameters unchanged.
  • Figure 3: ReLU and GELU.
  • Figure 4: The Atari-57 domain.
  • Figure 5: Median Human-Normalized performance training PQN, PQN (Resnet-15) and Hadamax-PQN in the Atari domain over 57 games, 200M frames and 5 seeds (left), and the Atari-57 score profile (right). The Atari-57 score profile illustrates the percentage of games that exceed the normalized score threshold on the x-axis.
  • ...and 9 more figures