Table of Contents
Fetching ...

Applying reinforcement learning to optical cavity locking tasks: considerations on actor-critic architectures and real-time hardware implementation

Mateusz Bawaj, Andrea Svizzeretto

TL;DR

The paper addresses autonomous control of Fabry–Perot cavity locking in nonlinear regimes using deep reinforcement learning. It implements a custom Gymnasium environment with a time-domain cavity simulator to train agents like DDPG to acquire and maintain resonance for both low- and high-finesse cavities, including Virgo-like configurations. It discusses improvements with TD3, SAC, and meta-reinforcement learning, and proposes low-latency hardware strategies such as FPGA-based inference and offline policy updates to bridge simulation and real optical setups. The work lays a foundation for RL-driven control in gravitational-wave detectors by identifying practical challenges and concrete paths toward real-time deployment.

Abstract

This proceedings contains our considerations made during and after fruitful discussions held at EuCAIFCon 2025. We explore the use of deep reinforcement learning for autonomous locking of Fabry-Perot optical cavities in non-linear regimes, with relevance to gravitational-wave detectors. A custom Gymnasium environment with a time-domain simulator enabled training of agents such as deep deterministic policy gradient, achieving reliable lock acquisition for both low- and high-finesse cavities, including Virgo-like parameters. We also discuss possible improvements with Twin Delayed DDPG, Soft Actor Critic and meta-reinforcement learning, as well as strategies for low-latency execution and off-line policy updates to address hardware limitations. These studies lay the groundwork for future deployment of reinforcement learning-based control in real optical setups.

Applying reinforcement learning to optical cavity locking tasks: considerations on actor-critic architectures and real-time hardware implementation

TL;DR

The paper addresses autonomous control of Fabry–Perot cavity locking in nonlinear regimes using deep reinforcement learning. It implements a custom Gymnasium environment with a time-domain cavity simulator to train agents like DDPG to acquire and maintain resonance for both low- and high-finesse cavities, including Virgo-like configurations. It discusses improvements with TD3, SAC, and meta-reinforcement learning, and proposes low-latency hardware strategies such as FPGA-based inference and offline policy updates to bridge simulation and real optical setups. The work lays a foundation for RL-driven control in gravitational-wave detectors by identifying practical challenges and concrete paths toward real-time deployment.

Abstract

This proceedings contains our considerations made during and after fruitful discussions held at EuCAIFCon 2025. We explore the use of deep reinforcement learning for autonomous locking of Fabry-Perot optical cavities in non-linear regimes, with relevance to gravitational-wave detectors. A custom Gymnasium environment with a time-domain simulator enabled training of agents such as deep deterministic policy gradient, achieving reliable lock acquisition for both low- and high-finesse cavities, including Virgo-like parameters. We also discuss possible improvements with Twin Delayed DDPG, Soft Actor Critic and meta-reinforcement learning, as well as strategies for low-latency execution and off-line policy updates to address hardware limitations. These studies lay the groundwork for future deployment of reinforcement learning-based control in real optical setups.

Paper Structure

This paper contains 6 sections, 1 figure.

Figures (1)

  • Figure 1: Performance of a standard DDPG agent, trained for 100k time-steps on a simulated cavity with parameters equivalent to the Virgo arm cavity. In this run, it takes approximately 200.0 steps to acquire the lock which is around 10ms.