Action-Driven Processes for Continuous-Time Control
Ruimin He, Shaowei Lin
TL;DR
The paper addresses the challenge of unifying continuous-time state dynamics with discrete decision actions by introducing Action-Driven Processes (ADPs). It develops two equivalent formulations of ADPs and situates them relative to MDPs, illustrating how reinforcement learning can be viewed through a variational-inference lens in continuous time. The key contribution is showing that maximum-entropy reinforcement learning emerges from KL-regularized inference on ADPs, with spiking neural networks used as representative examples. This framework offers a principled, time-continuous approach to learning in action-driven systems and points toward future work in algorithm design and diagrammatic, category-theoretic foundations for ADPs.
Abstract
At the heart of reinforcement learning are actions -- decisions made in response to observations of the environment. Actions are equally fundamental in the modeling of stochastic processes, as they trigger discontinuous state transitions and enable the flow of information through large, complex systems. In this paper, we unify the perspectives of stochastic processes and reinforcement learning through action-driven processes, and illustrate their application to spiking neural networks. Leveraging ideas from control-as-inference, we show that minimizing the Kullback-Leibler divergence between a policy-driven true distribution and a reward-driven model distribution for a suitably defined action-driven process is equivalent to maximum entropy reinforcement learning.
