Table of Contents
Fetching ...

Controlling dynamics of stochastic systems with deep reinforcement learning

Ruslan Mukhamadiarov

TL;DR

The paper addresses controlling stochastic multi-agent dynamics when explicit dynamical equations are incomplete. It proposes an agent-based simulation framework where a neural network controller learns local transition rules via deep reinforcement learning, with policy learning implemented through $Q$-learning and a separate post-training phase that samples actions from $P(a|s)$ using Softmax: $P(a|s)=\frac{\exp(-Q(s,a))}{\sum_a \exp(-Q(s,a))}$. The objective is to maximize the cumulative reward $R=\sum_{t=0}^T \gamma^t r_t$ using the TD update $Q_{t+1}(s,a)-Q_t(s,a)=\alpha\left(r+\gamma\max_a Q_t(s',a)-Q_t(s,a)\right)$, with exploration via $\varepsilon$-greedy. The authors test on particle coalescence and heterogeneous TASEP, showing that reward shaping can steer microscopic interactions to alter macroscopic transport, and that policy choice matters depending on system heterogeneity. The work demonstrates a viable path to bridging control theory and DRL for smart particle systems, with implications for active matter and autonomous micro-transport networks.

Abstract

A properly designed controller can help improve the quality of experimental measurements or force a dynamical system to follow a completely new time-evolution path. Recent developments in deep reinforcement learning have made steep advances toward designing effective control schemes for fairly complex systems. However, a general simulation scheme that employs deep reinforcement learning for exerting control in stochastic systems is yet to be established. In this paper, we attempt to further bridge a gap between control theory and deep reinforcement learning by proposing a simulation algorithm that allows achieving control of the dynamics of stochastic systems through the use of trained artificial neural networks. Specifically, we use agent-based simulations where the neural network plays the role of the controller that drives local state-to-state transitions. We demonstrate the workflow and the effectiveness of the proposed control methods by considering the following two stochastic processes: particle coalescence on a lattice and a totally asymmetric exclusion process.

Controlling dynamics of stochastic systems with deep reinforcement learning

TL;DR

The paper addresses controlling stochastic multi-agent dynamics when explicit dynamical equations are incomplete. It proposes an agent-based simulation framework where a neural network controller learns local transition rules via deep reinforcement learning, with policy learning implemented through -learning and a separate post-training phase that samples actions from using Softmax: . The objective is to maximize the cumulative reward using the TD update , with exploration via -greedy. The authors test on particle coalescence and heterogeneous TASEP, showing that reward shaping can steer microscopic interactions to alter macroscopic transport, and that policy choice matters depending on system heterogeneity. The work demonstrates a viable path to bridging control theory and DRL for smart particle systems, with implications for active matter and autonomous micro-transport networks.

Abstract

A properly designed controller can help improve the quality of experimental measurements or force a dynamical system to follow a completely new time-evolution path. Recent developments in deep reinforcement learning have made steep advances toward designing effective control schemes for fairly complex systems. However, a general simulation scheme that employs deep reinforcement learning for exerting control in stochastic systems is yet to be established. In this paper, we attempt to further bridge a gap between control theory and deep reinforcement learning by proposing a simulation algorithm that allows achieving control of the dynamics of stochastic systems through the use of trained artificial neural networks. Specifically, we use agent-based simulations where the neural network plays the role of the controller that drives local state-to-state transitions. We demonstrate the workflow and the effectiveness of the proposed control methods by considering the following two stochastic processes: particle coalescence on a lattice and a totally asymmetric exclusion process.

Paper Structure

This paper contains 6 sections, 4 equations, 3 figures.

Figures (3)

  • Figure 1: Schematic illustration of how a neural network drives local state-to-state transitions in the agent-based simulations. The observation state $s$ of the agent is provided to the artificial neural network as an input vector, which in turn produces $Q$-values for each action $a$. As such, the size of the input layer of our neural network is set by the size of the observation state vector $s$, and the output layer size corresponds to the size of the action space $a$. The temporal-difference (TD) error is computed with Eq. \ref{['eqn:Bellman']} and backpropagated to update the network parameters. After the neural network training is done, the network parameters become fixed, and the $Q$-values are mapped to transition rates $P(a|s)$ using Softmax function in Eq. \ref{['eqn:Softmax']}
  • Figure 2: (a) Schematic illustration of how a neural network controls dynamics of the coalescence process in one dimension. The selected agent is marked in red, and the blue brackets show the size of the agent's observation state that is being provided to the network. The network computes $Q$-values, which, after training, can be mapped to hopping probabilities $q$ and $p$ with the Softmax function. (b) Time evolution of the particle number fraction for different control scenarios on $d=2$ and $d=1$ (inset) lattices. The simple reward choices of $-1$ or $+1$ have been used to train the networks to discourage or encourage coalescence events, respectively. The data was obtained by running simulations after the training was finished. The simulations were run on $L=1000$ and $100\times 100$ lattices with periodic boundary conditions. Each data point was obtained by averaging over $10$ independent simulation runs.
  • Figure 3: (a) Simulation snapshots of heterogeneous totally asymmetric exclusion process (TASEP), where green (light grey) particles have a higher overall jumping rate and red (dark grey) lower. The jump directions of each particle are selected by a neural network that makes a decision based on the particle's observation state. Here are two scenarios considered: a simple control scenario, where only successful forward jumps are rewarded, and a speed gradient scenario, where in addition to rewarding forward jumps, particles are also encouraged to separate into lanes based on their "speed". The lattice sizes are $128\times 24$ and $128\times 32$, respectively. (b) The global current in heterogeneous TASEP obtained for different control policies in steady state and plotted against the standard deviation $\sigma$ in particle overall jumping rate $\nu \sim \mathcal{N}(0.5,\sigma^2)$ that serves as a measure of heterogeneity. The simulations were performed on half-filled lattices with periodic boundary conditions. Each data point has been obtained after averaging over 800 independent realizations. Adapted from Jonas Märtens Bachelor Thesis jonas_maertens.