Controlling dynamics of stochastic systems with deep reinforcement learning
Ruslan Mukhamadiarov
TL;DR
The paper addresses controlling stochastic multi-agent dynamics when explicit dynamical equations are incomplete. It proposes an agent-based simulation framework where a neural network controller learns local transition rules via deep reinforcement learning, with policy learning implemented through $Q$-learning and a separate post-training phase that samples actions from $P(a|s)$ using Softmax: $P(a|s)=\frac{\exp(-Q(s,a))}{\sum_a \exp(-Q(s,a))}$. The objective is to maximize the cumulative reward $R=\sum_{t=0}^T \gamma^t r_t$ using the TD update $Q_{t+1}(s,a)-Q_t(s,a)=\alpha\left(r+\gamma\max_a Q_t(s',a)-Q_t(s,a)\right)$, with exploration via $\varepsilon$-greedy. The authors test on particle coalescence and heterogeneous TASEP, showing that reward shaping can steer microscopic interactions to alter macroscopic transport, and that policy choice matters depending on system heterogeneity. The work demonstrates a viable path to bridging control theory and DRL for smart particle systems, with implications for active matter and autonomous micro-transport networks.
Abstract
A properly designed controller can help improve the quality of experimental measurements or force a dynamical system to follow a completely new time-evolution path. Recent developments in deep reinforcement learning have made steep advances toward designing effective control schemes for fairly complex systems. However, a general simulation scheme that employs deep reinforcement learning for exerting control in stochastic systems is yet to be established. In this paper, we attempt to further bridge a gap between control theory and deep reinforcement learning by proposing a simulation algorithm that allows achieving control of the dynamics of stochastic systems through the use of trained artificial neural networks. Specifically, we use agent-based simulations where the neural network plays the role of the controller that drives local state-to-state transitions. We demonstrate the workflow and the effectiveness of the proposed control methods by considering the following two stochastic processes: particle coalescence on a lattice and a totally asymmetric exclusion process.
