Table of Contents
Fetching ...

Finite Time Analysis of Constrained Natural Critic-Actor Algorithm with Improved Sample Complexity

Prashansa Panda, Shalabh Bhatnagar

TL;DR

This work introduces the first natural critic-actor algorithm with function approximation for the long-run average cost setting and under inequality constraints and provides the non-asymptotic convergence guarantees for this algorithm.

Abstract

Recent studies have increasingly focused on non-asymptotic convergence analyses for actor-critic (AC) algorithms. One such effort introduced a two-timescale critic-actor algorithm for the discounted cost setting using a tabular representation, where the usual roles of the actor and critic are reversed. However, only asymptotic convergence was established there. Subsequently, both asymptotic and non-asymptotic analyses of the critic-actor algorithm with linear function approximation were conducted. In our work, we introduce the first natural critic-actor algorithm with function approximation for the long-run average cost setting and under inequality constraints. We provide the non-asymptotic convergence guarantees for this algorithm. Our analysis establishes optimal learning rates and we also propose a modification to enhance sample complexity. We further show the results of experiments on three different Safety-Gym environments where our algorithm is found to be competitive in comparison with other well known algorithms.

Finite Time Analysis of Constrained Natural Critic-Actor Algorithm with Improved Sample Complexity

TL;DR

This work introduces the first natural critic-actor algorithm with function approximation for the long-run average cost setting and under inequality constraints and provides the non-asymptotic convergence guarantees for this algorithm.

Abstract

Recent studies have increasingly focused on non-asymptotic convergence analyses for actor-critic (AC) algorithms. One such effort introduced a two-timescale critic-actor algorithm for the discounted cost setting using a tabular representation, where the usual roles of the actor and critic are reversed. However, only asymptotic convergence was established there. Subsequently, both asymptotic and non-asymptotic analyses of the critic-actor algorithm with linear function approximation were conducted. In our work, we introduce the first natural critic-actor algorithm with function approximation for the long-run average cost setting and under inequality constraints. We provide the non-asymptotic convergence guarantees for this algorithm. Our analysis establishes optimal learning rates and we also propose a modification to enhance sample complexity. We further show the results of experiments on three different Safety-Gym environments where our algorithm is found to be competitive in comparison with other well known algorithms.

Paper Structure

This paper contains 20 sections, 6 theorems, 150 equations, 1 figure, 3 tables, 1 algorithm.

Key Result

Theorem 1

Under assumptions assum:bounded_feature_norm , epsilon_bound, assum:ergodicity , assum:policy-lipschitz-bounded, V_lipschitz_theta, V_lipschitz_gamma, the following holds: where, $y_t = (L_t - L(\theta_t,\gamma(t)))$, $M(\theta_t,v_t,\gamma(t)) = E_{s_t \sim \mu_{\theta_t},a_t \sim \pi_{\theta_t},s_{t+1} \sim p}[( r(s_t,a_t,\gamma(t))- L(\theta_t,\gamma(t)) + \phi(s_{t+1})^{\top} v_{t} - \phi(s_

Figures (1)

  • Figure 1: Comparison of C-AC, C-NAC, C-CA, C-NCA, C-CA Modified and C-NCA Modified.

Theorems & Definitions (7)

  • Theorem 1: Convergence of average cost estimate
  • proof
  • Theorem 2: Convergence of actor
  • Theorem 3: Convergence of critic
  • Theorem 4: Convergence of average cost estimate
  • Theorem 5: Convergence of actor
  • Theorem 6: Convergence of critic