Actor-Critic or Critic-Actor? A Tale of Two Time Scales

Shalabh Bhatnagar; Vivek S. Borkar; Soumyajit Guin

Actor-Critic or Critic-Actor? A Tale of Two Time Scales

Shalabh Bhatnagar, Vivek S. Borkar, Soumyajit Guin

TL;DR

This paper introduces the critic-actor algorithm by reversing the conventional two-time-scale updates of tabular actor-critic, making the value function updates slower and the policy updates faster. It proves convergence of this CA scheme using two-time-scale stochastic approximation and ODE techniques, showing that CA emulates value iteration under the reversed timescales. Empirically, CA achieves accuracy and computational efficiency comparable to or better than standard actor-critic across tabular and function-approximation settings, including linear and neural-network architectures. The work broadens the RL algorithmic landscape by providing a theoretically sound alternative to actor-critic with potential benefits in convergence behavior and practicality for large-scale problems.

Abstract

We revisit the standard formulation of tabular actor-critic algorithm as a two time-scale stochastic approximation with value function computed on a faster time-scale and policy computed on a slower time-scale. This emulates policy iteration. We observe that reversal of the time scales will in fact emulate value iteration and is a legitimate algorithm. We provide a proof of convergence and compare the two empirically with and without function approximation (with both linear and nonlinear function approximators) and observe that our proposed critic-actor algorithm performs on par with actor-critic in terms of both accuracy and computational effort.

Actor-Critic or Critic-Actor? A Tale of Two Time Scales

TL;DR

Abstract

Paper Structure (8 sections, 5 theorems, 40 equations, 11 figures, 1 table)

This paper contains 8 sections, 5 theorems, 40 equations, 11 figures, 1 table.

Introduction
The Basic Framework
The Proposed Critic-Actor Algorithm
Convergence of Critic-Actor Scheme
Convergence of the Faster Recursion
Convergence of the Slower Recursion
Numerical Results
Conclusions

Key Result

Lemma 1

The sequences converge almost surely as $m\rightarrow\infty$.

Figures (11)

Figure 1: $|S|=1000,|U|=6,\alpha_1=1,\beta_1=0.55,\alpha_2=1,\beta_2=0.55$
Figure 2: $|S|=1000,|U|=6,\alpha_1=0.95,\beta_1=0.75,\alpha_2=0.75,\beta_2=0.55$
Figure 3: $|S|=1000,|U|=6,\alpha_1=0.75,\beta_1=0.55,\alpha_2=0.95,\beta_2=0.75$
Figure 4: $|S|=400,|U|=4,\alpha_1=0.95,\beta_1=0.75,\alpha_2=0.75,\beta_2=0.55$
Figure 5: $|S|=10000,|U|=8,\alpha_1=0.75,\beta_1=0.55,\alpha_2=0.95,\beta_2=0.75$
...and 6 more figures

Theorems & Definitions (10)

Lemma 1
proof
Proposition 2
proof
Theorem 3
proof
Lemma 4
proof
Theorem 5
Remark 6

Actor-Critic or Critic-Actor? A Tale of Two Time Scales

TL;DR

Abstract

Actor-Critic or Critic-Actor? A Tale of Two Time Scales

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (11)

Theorems & Definitions (10)