Table of Contents
Fetching ...

Focal plane wavefront control with model-based reinforcement learning

Jalo Nousiainen, Iremsu Taskin, Markus Kasper, Gilles Orban De Xivry, Olivier Absil

Abstract

The direct imaging of potentially habitable exoplanets is one prime science case for high-contrast imaging instruments on extremely large telescopes. Most such exoplanets orbit close to their host stars, where their observation is limited by fast-moving atmospheric speckles and quasi-static non-common-path aberrations (NCPA). Conventional NCPA correction methods often use mechanical mirror probes, which compromise performance during operation. This work presents machine-learning-based NCPA control methods that automatically detect and correct both dynamic and static NCPA errors by leveraging sequential phase diversity. We extend previous work in reinforcement learning for AO to focal plane control. A new model-based RL algorithm, Policy Optimization for NCPAs (PO4NCPA), interprets the focal-plane image as input data and, through sequential phase diversity, determines phase corrections that optimize both non-coronagraphic and post-coronagraphic PSFs without prior system knowledge. Further, we demonstrate the effectiveness of this approach by numerically simulating static NCPA errors on a ground-based telescope and an infrared imager affected by water-vapor-induced seeing (dynamic NCPAs). Simulations show that PO4NCPA robustly compensates static and dynamic NCPAs. In static cases, it achieves near-optimal focal-plane light suppression with a coronagraph and near-optimal Strehl without one. With dynamics NCPA, it matches the performance of the modal least-squares reconstruction combined with a 1-step delay integrator in these metrics. The method remains effective for the ELT pupil, vector vortex coronagraph, and under photon and background noise. PO4NCPA is model-free and can be directly applied to standard imaging as well as to any coronagraph. Its sub-millisecond inference times and performance also make it suitable for real-time low-order correction of atmospheric turbulence beyond HCI.

Focal plane wavefront control with model-based reinforcement learning

Abstract

The direct imaging of potentially habitable exoplanets is one prime science case for high-contrast imaging instruments on extremely large telescopes. Most such exoplanets orbit close to their host stars, where their observation is limited by fast-moving atmospheric speckles and quasi-static non-common-path aberrations (NCPA). Conventional NCPA correction methods often use mechanical mirror probes, which compromise performance during operation. This work presents machine-learning-based NCPA control methods that automatically detect and correct both dynamic and static NCPA errors by leveraging sequential phase diversity. We extend previous work in reinforcement learning for AO to focal plane control. A new model-based RL algorithm, Policy Optimization for NCPAs (PO4NCPA), interprets the focal-plane image as input data and, through sequential phase diversity, determines phase corrections that optimize both non-coronagraphic and post-coronagraphic PSFs without prior system knowledge. Further, we demonstrate the effectiveness of this approach by numerically simulating static NCPA errors on a ground-based telescope and an infrared imager affected by water-vapor-induced seeing (dynamic NCPAs). Simulations show that PO4NCPA robustly compensates static and dynamic NCPAs. In static cases, it achieves near-optimal focal-plane light suppression with a coronagraph and near-optimal Strehl without one. With dynamics NCPA, it matches the performance of the modal least-squares reconstruction combined with a 1-step delay integrator in these metrics. The method remains effective for the ELT pupil, vector vortex coronagraph, and under photon and background noise. PO4NCPA is model-free and can be directly applied to standard imaging as well as to any coronagraph. Its sub-millisecond inference times and performance also make it suitable for real-time low-order correction of atmospheric turbulence beyond HCI.

Paper Structure

This paper contains 21 sections, 16 equations, 11 figures, 1 table, 1 algorithm.

Figures (11)

  • Figure 1: Illustration of the preprocessing step of the focal plane data, i.e, the observation. For a non-coronagraphic system, the Airy pattern (perfect PSF) is subtracted from the PSF, and for the perfect coronagraph, the perfect PSF is simply a dark image (background). The images are then flattened with the cubic root.
  • Figure 2: Dynamics model NN design. Trained on closed-loop data (science camera images and residual commands), the dynamics model learns to simulate the optical path of the light. Highlighted in green are the inputs to the NN.
  • Figure 3: Policy model NN design. In the control loop, the image inputs ($\bm o_t$, and $\bm o_{t-1}$) are focal plane images from the science camera, and while training, the future inputs (Algorithm 1 line 11, $t = 2,3, \cdots H$) are predicted/simulated by the dynamics model.
  • Figure 4: Training plots of PO4NCPA on circular pupil with SI and PC. Here we plot the negative cumulative reward (loss) after each episode in the training circle. The light blue curve shows the raw negative reward, and the dark blue shows the smoothed value (moving average). For the dynamic case (c, d), the DM is flattened only at the start of the episode for the first 40k episodes to prevent saturation. Afterward, each episode starts from the previous endpoint, thereby mimicking a continuously updated closed-loop control system. Hence, the bigger episode reward after that.
  • Figure 5: PO4NCPA convergence on circular pupil with SI (left) and PC (right) in the case of static NCPA. Top row: reward on time steps. PO4NCPA episodes are shown in green (best), blue (median), red (worst), while black and gray lines correspond to fitting error rewards. Bottom row: PO4NCPA residual wavefront RMS (inside the control region) over the same episodes (fitting error is always zero).
  • ...and 6 more figures