Table of Contents
Fetching ...

A Controlled Study of Double DQN and Dueling DQN Under Cross-Environment Transfer

Azka Nasir, Fatima Dossa, Muhammad Ahmed Atif, Mohammad Ahmed Atif

TL;DR

The paper addresses how architectural inductive biases in value-based deep reinforcement learning affect cross-environment transfer. It conducts a controlled empirical comparison between Double Deep Q-Network (DDQN) and Dueling DQN when transferring from CartPole to LunarLander using a fixed layer-wise representation transfer protocol with early freezing. DDQN consistently yields robust positive transfer and learning dynamics similar to training from scratch, while Dueling DQN exhibits strong negative transfer under the same conditions; statistical analyses confirm architecture strongly influences transfer robustness. These findings imply that architectural choices optimized for single-task performance may hinder transfer, and they motivate exploring hybrid or more robust transfer mechanisms across tasks with heterogeneous dynamics.

Abstract

Transfer learning in deep reinforcement learning is often motivated by improved stability and reduced training cost, but it can also fail under substantial domain shift. This paper presents a controlled empirical study examining how architectural differences between Double Deep Q-Networks (DDQN) and Dueling DQN influence transfer behavior across environments. Using CartPole as a source task and LunarLander as a structurally distinct target task, we evaluate a fixed layer-wise representation transfer protocol under identical hyperparameters and training conditions, with baseline agents trained from scratch used to contextualize transfer effects. Empirical results show that DDQN consistently avoids negative transfer under the examined setup and maintains learning dynamics comparable to baseline performance in the target environment. In contrast, Dueling DQN consistently exhibits negative transfer under identical conditions, characterized by degraded rewards and unstable optimization behavior. Statistical analysis across multiple random seeds confirms a significant performance gap under transfer. These findings suggest that architectural inductive bias is strongly associated with robustness to cross-environment transfer in value-based deep reinforcement learning under the examined transfer protocol.

A Controlled Study of Double DQN and Dueling DQN Under Cross-Environment Transfer

TL;DR

The paper addresses how architectural inductive biases in value-based deep reinforcement learning affect cross-environment transfer. It conducts a controlled empirical comparison between Double Deep Q-Network (DDQN) and Dueling DQN when transferring from CartPole to LunarLander using a fixed layer-wise representation transfer protocol with early freezing. DDQN consistently yields robust positive transfer and learning dynamics similar to training from scratch, while Dueling DQN exhibits strong negative transfer under the same conditions; statistical analyses confirm architecture strongly influences transfer robustness. These findings imply that architectural choices optimized for single-task performance may hinder transfer, and they motivate exploring hybrid or more robust transfer mechanisms across tasks with heterogeneous dynamics.

Abstract

Transfer learning in deep reinforcement learning is often motivated by improved stability and reduced training cost, but it can also fail under substantial domain shift. This paper presents a controlled empirical study examining how architectural differences between Double Deep Q-Networks (DDQN) and Dueling DQN influence transfer behavior across environments. Using CartPole as a source task and LunarLander as a structurally distinct target task, we evaluate a fixed layer-wise representation transfer protocol under identical hyperparameters and training conditions, with baseline agents trained from scratch used to contextualize transfer effects. Empirical results show that DDQN consistently avoids negative transfer under the examined setup and maintains learning dynamics comparable to baseline performance in the target environment. In contrast, Dueling DQN consistently exhibits negative transfer under identical conditions, characterized by degraded rewards and unstable optimization behavior. Statistical analysis across multiple random seeds confirms a significant performance gap under transfer. These findings suggest that architectural inductive bias is strongly associated with robustness to cross-environment transfer in value-based deep reinforcement learning under the examined transfer protocol.
Paper Structure (16 sections, 1 equation, 3 figures)

This paper contains 16 sections, 1 equation, 3 figures.

Figures (3)

  • Figure 1: Validation rewards: Transfer DDQN stable (-60) vs. Transfer Dueling degraded (-370).
  • Figure 2: Episode rewards: Transfer DDQN succeeds (+200) vs. Transfer Dueling fails (-150).
  • Figure 3: Training loss: DDQN smooth ($<2$) vs. Dueling oscillatory (8-12).