A Method of Selective Attention for Reservoir Based Agents
Kevin McKee
TL;DR
In reservoir-based reinforcement learning, training efficiency is hindered by high-dimensional, uninformative inputs. The paper compares Layer Normalization, a simple vector filter, and a novel Excessively Parameterized Input Concealment (EPIC) that uses reward-driven gradients to suppress inputs, with EPIC achieving the largest speedups. Empirically, input masking speeds training substantially, up to about four-fold versus a no-masking baseline and around two-fold over LayerNorm, sustaining gains across different distraction sizes. The work supports reward-driven selective attention as a practical, low-cost technique to enhance training efficiency for memory-augmented RL agents, with implications for designing attention mechanisms in sequential decision tasks.
Abstract
Training of deep reinforcement learning agents is slowed considerably by the presence of input dimensions that do not usefully condition the reward function. Existing modules such as layer normalization can be trained with weight decay to act as a form of selective attention, i.e. an input mask, that shrinks the scale of unnecessary inputs, which in turn accelerates training of the policy. However, we find a surprising result that adding numerous parameters to the computation of the input mask results in much faster training. A simple, high dimensional masking module is compared with layer normalization and a model without any input suppression. The high dimensional mask resulted in a four-fold speedup in training over the null hypothesis and a two-fold speedup in training over the layer normalization method.
