Real-Time Diffusion Policies for Games: Enhancing Consistency Policies with Q-Ensembles
Ruoqi Zhang, Ziwei Luo, Jens Sjölund, Per Mattsson, Linus Gisslén, Alessandro Sestini
TL;DR
This paper addresses the slow inference of diffusion-based game policies by introducing CPQE, a Consistency Policy with Q-Ensembles that enables fast, one-step action generation while maintaining strong performance. By combining a consistency model with an ensemble of Q-functions and a pessimistic lower-confidence bound, CPQE achieves up to 60 Hz inference and improved training stability compared to prior diffusion and single-Q baselines. The method demonstrates competitive rewards against multi-step diffusion policies across two Unity-based tasks, with clear gains in speed and stability, and shows that Q-ensembles yield more reliable value estimates than single Q-networks. The practical impact lies in enabling real-time, multi-modal policy learning for NPCs in games and other real-time applications where rapid inference and robust decision-making are critical.
Abstract
Diffusion models have shown impressive performance in capturing complex and multi-modal action distributions for game agents, but their slow inference speed prevents practical deployment in real-time game environments. While consistency models offer a promising approach for one-step generation, they often suffer from training instability and performance degradation when applied to policy learning. In this paper, we present CPQE (Consistency Policy with Q-Ensembles), which combines consistency models with Q-ensembles to address these challenges.CPQE leverages uncertainty estimation through Q-ensembles to provide more reliable value function approximations, resulting in better training stability and improved performance compared to classic double Q-network methods. Our extensive experiments across multiple game scenarios demonstrate that CPQE achieves inference speeds of up to 60 Hz -- a significant improvement over state-of-the-art diffusion policies that operate at only 20 Hz -- while maintaining comparable performance to multi-step diffusion approaches. CPQE consistently outperforms state-of-the-art consistency model approaches, showing both higher rewards and enhanced training stability throughout the learning process. These results indicate that CPQE offers a practical solution for deploying diffusion-based policies in games and other real-time applications where both multi-modal behavior modeling and rapid inference are critical requirements.
