Boosting the Actor with Dual Critic
Bo Dai, Albert Shaw, Niao He, Lihong Li, Le Song
TL;DR
The paper reframes policy optimization as a two-player game between an actor and a dual critic by deriving a Lagrangian dual form of the Bellman optimality equation. It introduces Dual-AC, a multi-step, path-regularized, stochastic dual ascent algorithm that updates the value function, dual weights, and policy in a coordinated way to optimize a common objective. The approach addresses instability in function-approximation settings, demonstrates local duality via path regularization, and achieves state-of-the-art or competitive results on MuJoCo continuous-control benchmarks. This framework provides a unified, theoretically grounded pathway for stable, efficient actor-critic learning with principled off-policy data utilization.
Abstract
This paper proposes a new actor-critic-style algorithm called Dual Actor-Critic or Dual-AC. It is derived in a principled way from the Lagrangian dual form of the Bellman optimality equation, which can be viewed as a two-player game between the actor and a critic-like function, which is named as dual critic. Compared to its actor-critic relatives, Dual-AC has the desired property that the actor and dual critic are updated cooperatively to optimize the same objective function, providing a more transparent way for learning the critic that is directly related to the objective function of the actor. We then provide a concrete algorithm that can effectively solve the minimax optimization problem, using techniques of multi-step bootstrapping, path regularization, and stochastic dual ascent algorithm. We demonstrate that the proposed algorithm achieves the state-of-the-art performances across several benchmarks.
