Table of Contents
Fetching ...

Full-Gradient Successor Feature Representations

Ritish Shrirao, Aditya Priyadarshi, Raghuram Bharadwaj Diddigi

Abstract

Successor Features (SF) combined with Generalized Policy Improvement (GPI) provide a robust framework for transfer learning in Reinforcement Learning (RL) by decoupling environment dynamics from reward functions. However, standard SF learning methods typically rely on semi-gradient Temporal Difference (TD) updates. When combined with non-linear function approximation, semi-gradient methods lack robust convergence guarantees and can lead to instability, particularly in the multi-task setting where accurate feature estimation is critical for effective GPI. Inspired by Full Gradient DQN, we propose Full-Gradient Successor Feature Representations Q-Learning (FG-SFRQL), an algorithm that optimizes the successor features by minimizing the full Mean Squared Bellman Error. Unlike standard approaches, our method computes gradients with respect to parameters in both the online and target networks. We provide a theoretical proof of almost-sure convergence for FG-SFRQL and demonstrate empirically that minimizing the full residual leads to superior sample efficiency and transfer performance compared to semi-gradient baselines in both discrete and continuous domains.

Full-Gradient Successor Feature Representations

Abstract

Successor Features (SF) combined with Generalized Policy Improvement (GPI) provide a robust framework for transfer learning in Reinforcement Learning (RL) by decoupling environment dynamics from reward functions. However, standard SF learning methods typically rely on semi-gradient Temporal Difference (TD) updates. When combined with non-linear function approximation, semi-gradient methods lack robust convergence guarantees and can lead to instability, particularly in the multi-task setting where accurate feature estimation is critical for effective GPI. Inspired by Full Gradient DQN, we propose Full-Gradient Successor Feature Representations Q-Learning (FG-SFRQL), an algorithm that optimizes the successor features by minimizing the full Mean Squared Bellman Error. Unlike standard approaches, our method computes gradients with respect to parameters in both the online and target networks. We provide a theoretical proof of almost-sure convergence for FG-SFRQL and demonstrate empirically that minimizing the full residual leads to superior sample efficiency and transfer performance compared to semi-gradient baselines in both discrete and continuous domains.

Paper Structure

This paper contains 29 sections, 1 theorem, 22 equations, 11 figures, 2 tables, 3 algorithms.

Key Result

Theorem 1

Under Assumptions 1--5, the joint iterate sequence $\{\Theta_k\}$ produced by the FG-SFRQL algorithm converges almost surely to a stationary point $\Theta^\ast$ of the aggregate Bellman error $E(\Theta) = \sum_{j=1}^m E_j(\theta^{(j)})$, i.e., a point satisfying $\nabla E(\Theta^\ast)=0$. $\blacktri

Figures (11)

  • Figure 3: Cumulative training reward across tasks. FG-SFDQN consistently achieves faster learning and higher cumulative returns than baselines across all environments. Resets to zero indicate task transitions.
  • Figure 4: Ablation of averaging for FG-SFDQN. Curves plot cumulative reward collected on the active task over training steps. FG-SFDQN (Alg. \ref{['alg:fg-sfrql']}) achieves much higher cumulative returns compared to its averaging variants with $N=5,10,20$.
  • Figure 5: Final evaluation performance across environments. FG-SFDQN (Alg. 1) consistently achieves higher returns compared to all baselines. Color scheme: DQN (blue), SFDQN (orange), FG-SFDQN (green), FG-SFDQN Alg. 2 (red), FG-SFDQN Alg. 3 (purple), FGDQN (brown).
  • Figure :
  • Figure :
  • ...and 6 more figures

Theorems & Definitions (2)

  • Theorem 1: Joint-iterate convergence
  • proof