Optimizing Attention and Cognitive Control Costs Using Temporally-Layered Architectures

Devdhar Patel; Terrence Sejnowski; Hava Siegelmann

Optimizing Attention and Cognitive Control Costs Using Temporally-Layered Architectures

Devdhar Patel, Terrence Sejnowski, Hava Siegelmann

TL;DR

This work proposes a decision-bounded Markov decision process (DB-MDP), enabling agents to manage computational costs through two layers with distinct timescales and energy requirements, and introduces a biologically inspired, temporally layered architecture (TLA).

Abstract

The current reinforcement learning framework focuses exclusively on performance, often at the expense of efficiency. In contrast, biological control achieves remarkable performance while also optimizing computational energy expenditure and decision frequency. We propose a Decision Bounded Markov Decision Process (DB-MDP), that constrains the number of decisions and computational energy available to agents in reinforcement learning environments. Our experiments demonstrate that existing reinforcement learning algorithms struggle within this framework, leading to either failure or suboptimal performance. To address this, we introduce a biologically-inspired, Temporally Layered Architecture (TLA), enabling agents to manage computational costs through two layers with distinct time scales and energy requirements. TLA achieves optimal performance in decision-bounded environments and in continuous control environments, it matches state-of-the-art performance while utilizing a fraction of the compute cost. Compared to current reinforcement learning algorithms that solely prioritize performance, our approach significantly lowers computational energy expenditure while maintaining performance. These findings establish a benchmark and pave the way for future research on energy and time-aware control.

Optimizing Attention and Cognitive Control Costs Using Temporally-Layered Architectures

TL;DR

Abstract

Paper Structure (17 sections, 7 equations, 8 figures, 6 tables)

This paper contains 17 sections, 7 equations, 8 figures, 6 tables.

Introduction
Background
Continuous Control
Action repetition and frame skipping
Residual and Layered RL
Options Framework
Multi-Agent Reinforcement Learning and Non-Stationarity
Decision Bounded Markov Decision Process
Temporally Layered Architecture
Temporal Adaptivity
Temporally Adaptive Reinforcement Learning
Temporally Layered Architecture (TLA)
Experiments
Decision Bounded Environments
Decision Unbounded Continuous Control Environments
...and 2 more sections

Figures (8)

Figure 1: The Temporally Layered Architecture (TLA) comprises two layers: the Slow policy (blue) and the Fast policy (red). The switch policy can activate or deactivate the Fast policy, thus switching between the two layers. The reward given to each network is augmented differently with the energy and consistency penalty, which forces the overall policy to learn temporal abstractions from performance and energy-based contexts.
Figure 2: (a): A simple MDP with $S=5$ states. Each state has two actions, one that leads to the next state and one that results in the same state change. (b): Time-limited MDP: In the time-limited MDP setting, there is an additional limit on the amount of time available ($T$). The MDP thus is expanded to $S \times T$ states. Right: Decision-Bounded MDP: In Decision-Bounded MDP, the number of decisions are limited. However, a single decision can result in multiple planned actions. Similar to time-limited MDP, there $S \times D$ states where $D$ is the number of available decisions. However, a larger part of the MDP is reachable if the agent is able to take multiple actions per decisions, resulting in cognitive cost reduction.
Figure 3: Gridworld Environments. The grey box represents the starting state and the blue box is the goal state.
Figure 4: Decision Bounded Gridworld environments. TLA (blue) achieves the optimal performance with the fewest required decisions. All results are averaged over 20 trials. Top: Average reward vs. Episodes during training. Bottom: Decisions vs. Episodes during training.
Figure 5: Decision Bounded Continuous Control environments. Top: average reward vs. training episodes. Bottom: Decisions vs. training episodes. All results are averaged over 10 trials. The shaded region represents standard error. Left: In the Lunar Lander environment, which is robust to action repetition, TD3 extended action (TD3-EA) shows superior performance. In this environment, it takes longer for TLA to converge towards optimal average reward, and thus decisions are not properly optimized during the training period as TLA prioritizes reward over decisions. Yet, TLA outperforms other algorithms and is able to successfully solve the environment under the decision constraint. Right: Due to the longer step-size, TLA, TD3-EA and TempoRL are able to successfully solve the Mountain Car task. However, TLA achieves better performance than TempoRL and TD3-EA while achieving the lower bound on decisions represented by TD3-EA.
...and 3 more figures

Optimizing Attention and Cognitive Control Costs Using Temporally-Layered Architectures

TL;DR

Abstract

Optimizing Attention and Cognitive Control Costs Using Temporally-Layered Architectures

Authors

TL;DR

Abstract

Table of Contents

Figures (8)