Solving Online Resource-Constrained Scheduling for Follow-Up Observation in Astronomy: a Reinforcement Learning Approach
Yajie Zhang, Ce Yu, Chao Sun, Jizeng Wei, Junhan Ju, Shanjiang Tang
TL;DR
This paper tackles online, resource-constrained scheduling for follow-up astronomical observations with a telescope array, formulating the problem as an MDP aimed at minimizing average task slowdown. It introduces ROARS, a reinforcement learning framework that encodes schedules as DAGs and uses iterative, local rewriting guided by a Child-Sum Tree-LSTM-based graph encoder, with region- and rule-selection policies trained end-to-end. Through simulations on realistic, real-world-inspired data, ROARS consistently outperforms classical online heuristics and approaches offline performance, while generalizing to unseen task distributions and extending to distributed arrays. The work advances practical, scalable decision-making for time-critical ToO observations and lays groundwork for multi-objective extensions and deeper integration with global telescope networks.
Abstract
In the astronomical observation field, determining the allocation of observation resources of the telescope array and planning follow-up observations for targets of opportunity (ToOs) are indispensable components of astronomical scientific discovery. This problem is computationally challenging, given the online observation setting and the abundance of time-varying factors that can affect whether an observation can be conducted. This paper presents ROARS, a reinforcement learning approach for online astronomical resource-constrained scheduling. To capture the structure of the astronomical observation scheduling, we depict every schedule using a directed acyclic graph (DAG), illustrating the dependency of timing between different observation tasks within the schedule. Deep reinforcement learning is used to learn a policy that can improve the feasible solution by iteratively local rewriting until convergence. It can solve the challenge of obtaining a complete solution directly from scratch in astronomical observation scenarios, due to the high computational complexity resulting from numerous spatial and temporal constraints. A simulation environment is developed based on real-world scenarios for experiments, to evaluate the effectiveness of our proposed scheduling approach. The experimental results show that ROARS surpasses 5 popular heuristics, adapts to various observation scenarios and learns effective strategies with hindsight.
