Reinforcement learning

Sarod Yatawatta

Reinforcement learning

Sarod Yatawatta

TL;DR

This paper surveys modern deep reinforcement learning methods and their applicability to astronomy, grounding the discussion in the Markov decision process $(\mathcal{S},\mathcal{A},\mathcal{R},\mathcal{P})$ and key quantities such as $Q(s,a)$, $V(s)$, and $\pi$. It covers model‑free algorithms for discrete and continuous actions (DDPG, TD3, SAC) with components like experience replay and target networks, and expands to model‑based RL using probabilistic ensembles and trajectory sampling (PETS). A notable contribution is the introduction of hint assisted RL to inject domain knowledge, along with practical guidance for applying RL to astronomical tasks and a calibration example using AIC as the reward. The work emphasizes data efficiency, planning under uncertainty, and provides public code to facilitate rapid adoption in data‑intensive astronomical workflows.

Abstract

Observing celestial objects and advancing our scientific knowledge about them involves tedious planning, scheduling, data collection and data post-processing. Many of these operational aspects of astronomy are guided and executed by expert astronomers. Reinforcement learning is a mechanism where we (as humans and astronomers) can teach agents of artificial intelligence to perform some of these tedious tasks. In this paper, we will present a state of the art overview of reinforcement learning and how it can benefit astronomy.

Reinforcement learning

TL;DR

This paper surveys modern deep reinforcement learning methods and their applicability to astronomy, grounding the discussion in the Markov decision process

and key quantities such as

, and

. It covers model‑free algorithms for discrete and continuous actions (DDPG, TD3, SAC) with components like experience replay and target networks, and expands to model‑based RL using probabilistic ensembles and trajectory sampling (PETS). A notable contribution is the introduction of hint assisted RL to inject domain knowledge, along with practical guidance for applying RL to astronomical tasks and a calibration example using AIC as the reward. The work emphasizes data efficiency, planning under uncertainty, and provides public code to facilitate rapid adoption in data‑intensive astronomical workflows.

Abstract

Paper Structure (22 sections, 30 equations, 12 figures, 4 tables, 4 algorithms)

This paper contains 22 sections, 30 equations, 12 figures, 4 tables, 4 algorithms.

Introduction
Reinforcement learning theory
The state, action and reward
Markov decision processes
Q function, value function and policy
Deep reinforcement learning algorithms
Experience replay
Discrete action RL
Continuous action RL
Deep deterministic policy gradient (DDPG)
Twin delayed DDPG (TD3)
Soft actor critic (SAC)
Model based reinforcement learning
Probabilistic ensemble models
Probabilistic ensemble with trajectory sampling
...and 7 more sections

Figures (12)

Figure 1: An agent interacting with its environment. The agent receives an observation and performs an action and receives a reward corresponding to the action.
Figure 2: The maze environment with $5$ valid states $0,1,\ldots,4$. The agent can move (act) $\leftarrow$,$\rightarrow$,$\uparrow$, or $\downarrow$. The state $\mathcal{S}$ is a discrete space with $5$ states and the action $\mathcal{A}$ is also a discrete space with $4$ actions.
Figure 3: An RL agent composed of an actor and a critic.
Figure 4: Model based RL. A dynamics model representing the environment is created and used by the agent.
Figure 5: Hint assisted RL. An external hint is directly provided to the actor in the RL agent.
...and 7 more figures

Reinforcement learning

TL;DR

Abstract

Reinforcement learning

Authors

TL;DR

Abstract

Table of Contents

Figures (12)