Table of Contents
Fetching ...

SwitchMT: An Adaptive Context Switching Methodology for Scalable Multi-Task Learning in Intelligent Autonomous Agents

Avaneesh Devkota, Rachmad Vidya Wicaksana Putra, Muhammad Shafique

TL;DR

SwitchMT tackles the problem of scalable multi-task reinforcement learning for autonomous agents operating in dynamic environments with data streams, where fixed task-switching schedules cause inefficiency and interference. It introduces an adaptive task-switching policy coupled with a Deep Spiking Q-Network that features active dendrites and a dueling architecture, selecting the MTSpark_ADD base network for enhanced context specialization. The switching policy leverages both rewards and real-time internal dynamics, using a parameter-change metric $\Delta \theta$ to decide when to switch tasks, which reduces wasted training time and penalties from overfitting. Empirical results on Atari benchmarks show SwitchMT achieving competitive performance against state-of-the-art methods and human baselines, while offering improved learning efficiency and generalization for multi-task RL in autonomous agents.

Abstract

The ability to train intelligent autonomous agents (such as mobile robots) on multiple tasks is crucial for adapting to dynamic real-world environments. However, state-of-the-art reinforcement learning (RL) methods only excel in single-task settings, and still struggle to generalize across multiple tasks due to task interference. Moreover, real-world environments also demand the agents to have data stream processing capabilities. Toward this, a state-of-the-art work employs Spiking Neural Networks (SNNs) to improve multi-task learning by exploiting temporal information in data stream, while enabling lowpower/energy event-based operations. However, it relies on fixed context/task-switching intervals during its training, hence limiting the scalability and effectiveness of multi-task learning. To address these limitations, we propose SwitchMT, a novel adaptive task-switching methodology for RL-based multi-task learning in autonomous agents. Specifically, SwitchMT employs the following key ideas: (1) a Deep Spiking Q-Network with active dendrites and dueling structure, that utilizes task-specific context signals to create specialized sub-networks; and (2) an adaptive task-switching policy that leverages both rewards and internal dynamics of the network parameters. Experimental results demonstrate that SwitchMT achieves superior performance in multi-task learning compared to state-of-the-art methods. It achieves competitive scores in multiple Atari games (i.e., Pong: -8.8, Breakout: 5.6, and Enduro: 355.2) compared to the state-of-the-art, showing its better generalized learning capability. These results highlight the effectiveness of our SwitchMT methodology in addressing task interference while enabling multi-task learning automation through adaptive task switching, thereby paving the way for more efficient generalist agents with scalable multi-task learning capabilities.

SwitchMT: An Adaptive Context Switching Methodology for Scalable Multi-Task Learning in Intelligent Autonomous Agents

TL;DR

SwitchMT tackles the problem of scalable multi-task reinforcement learning for autonomous agents operating in dynamic environments with data streams, where fixed task-switching schedules cause inefficiency and interference. It introduces an adaptive task-switching policy coupled with a Deep Spiking Q-Network that features active dendrites and a dueling architecture, selecting the MTSpark_ADD base network for enhanced context specialization. The switching policy leverages both rewards and real-time internal dynamics, using a parameter-change metric to decide when to switch tasks, which reduces wasted training time and penalties from overfitting. Empirical results on Atari benchmarks show SwitchMT achieving competitive performance against state-of-the-art methods and human baselines, while offering improved learning efficiency and generalization for multi-task RL in autonomous agents.

Abstract

The ability to train intelligent autonomous agents (such as mobile robots) on multiple tasks is crucial for adapting to dynamic real-world environments. However, state-of-the-art reinforcement learning (RL) methods only excel in single-task settings, and still struggle to generalize across multiple tasks due to task interference. Moreover, real-world environments also demand the agents to have data stream processing capabilities. Toward this, a state-of-the-art work employs Spiking Neural Networks (SNNs) to improve multi-task learning by exploiting temporal information in data stream, while enabling lowpower/energy event-based operations. However, it relies on fixed context/task-switching intervals during its training, hence limiting the scalability and effectiveness of multi-task learning. To address these limitations, we propose SwitchMT, a novel adaptive task-switching methodology for RL-based multi-task learning in autonomous agents. Specifically, SwitchMT employs the following key ideas: (1) a Deep Spiking Q-Network with active dendrites and dueling structure, that utilizes task-specific context signals to create specialized sub-networks; and (2) an adaptive task-switching policy that leverages both rewards and internal dynamics of the network parameters. Experimental results demonstrate that SwitchMT achieves superior performance in multi-task learning compared to state-of-the-art methods. It achieves competitive scores in multiple Atari games (i.e., Pong: -8.8, Breakout: 5.6, and Enduro: 355.2) compared to the state-of-the-art, showing its better generalized learning capability. These results highlight the effectiveness of our SwitchMT methodology in addressing task interference while enabling multi-task learning automation through adaptive task switching, thereby paving the way for more efficient generalist agents with scalable multi-task learning capabilities.

Paper Structure

This paper contains 18 sections, 3 equations, 7 figures, 3 tables, 1 algorithm.

Figures (7)

  • Figure 1: Multi-task learning performance of the state-of-the-art RL-based methods for ANN (i.e., DQN Playing_Atari) and SNN (i.e., DSQN DSQN) on different three Atari games (i.e., Pong, Breakout, and Enduro). These results show that these works suffer from unstable learning progress across multiple tasks.
  • Figure 2: The overview of our SwitchMT methodology, showing its key steps: network architecture selection and adaptive task-switching policy.
  • Figure 3: The network architecture employed in the SwitchMT methodology; adapted from MTSpark_ADD MTSpark.
  • Figure 4: Performance of different models (i.e., DQN, DSQN, DQN_D, DSQN_D, MTSpark_ADD and SwitchMT) when trained for 250 episodes across three environments: Pong, Breakout, and Enduro.
  • Figure 5: Performance evaluation of DQN_D, MTSpark_ADD, and SwitchMT on Pong game. Here, a higher game point means better performance.
  • ...and 2 more figures