Table of Contents
Fetching ...

Action-Driven Processes for Continuous-Time Control

Ruimin He, Shaowei Lin

TL;DR

The paper addresses the challenge of unifying continuous-time state dynamics with discrete decision actions by introducing Action-Driven Processes (ADPs). It develops two equivalent formulations of ADPs and situates them relative to MDPs, illustrating how reinforcement learning can be viewed through a variational-inference lens in continuous time. The key contribution is showing that maximum-entropy reinforcement learning emerges from KL-regularized inference on ADPs, with spiking neural networks used as representative examples. This framework offers a principled, time-continuous approach to learning in action-driven systems and points toward future work in algorithm design and diagrammatic, category-theoretic foundations for ADPs.

Abstract

At the heart of reinforcement learning are actions -- decisions made in response to observations of the environment. Actions are equally fundamental in the modeling of stochastic processes, as they trigger discontinuous state transitions and enable the flow of information through large, complex systems. In this paper, we unify the perspectives of stochastic processes and reinforcement learning through action-driven processes, and illustrate their application to spiking neural networks. Leveraging ideas from control-as-inference, we show that minimizing the Kullback-Leibler divergence between a policy-driven true distribution and a reward-driven model distribution for a suitably defined action-driven process is equivalent to maximum entropy reinforcement learning.

Action-Driven Processes for Continuous-Time Control

TL;DR

The paper addresses the challenge of unifying continuous-time state dynamics with discrete decision actions by introducing Action-Driven Processes (ADPs). It develops two equivalent formulations of ADPs and situates them relative to MDPs, illustrating how reinforcement learning can be viewed through a variational-inference lens in continuous time. The key contribution is showing that maximum-entropy reinforcement learning emerges from KL-regularized inference on ADPs, with spiking neural networks used as representative examples. This framework offers a principled, time-continuous approach to learning in action-driven systems and points toward future work in algorithm design and diagrammatic, category-theoretic foundations for ADPs.

Abstract

At the heart of reinforcement learning are actions -- decisions made in response to observations of the environment. Actions are equally fundamental in the modeling of stochastic processes, as they trigger discontinuous state transitions and enable the flow of information through large, complex systems. In this paper, we unify the perspectives of stochastic processes and reinforcement learning through action-driven processes, and illustrate their application to spiking neural networks. Leveraging ideas from control-as-inference, we show that minimizing the Kullback-Leibler divergence between a policy-driven true distribution and a reward-driven model distribution for a suitably defined action-driven process is equivalent to maximum entropy reinforcement learning.

Paper Structure

This paper contains 14 sections, 45 equations, 1 figure.

Figures (1)

  • Figure 1: A circuit model of an integrate-and-fire network

Theorems & Definitions (3)

  • Example 1: Boltzmann Machines
  • Example 2: Integrate-and-fire networks
  • Example 3: Spiking network