Identifying Optimal Launch Sites of High-Altitude Latex-Balloons using Bayesian Optimisation for the Task of Station-Keeping

Jack Saunders; Sajad Saeedi; Adam Hartshorne; Binbin Xu; Özgur Şimşek; Alan Hunter; Wenbin Li

Identifying Optimal Launch Sites of High-Altitude Latex-Balloons using Bayesian Optimisation for the Task of Station-Keeping

Jack Saunders, Sajad Saeedi, Adam Hartshorne, Binbin Xu, Özgur Şimşek, Alan Hunter, Wenbin Li

TL;DR

This paper tackles identifying the optimal launch configuration for station-keeping of high-altitude latex balloons under complex wind and environmental dynamics. It combines a Soft Actor-Critic controller with two reward designs to mitigate reward hacking, showing that a TanH-shaped reward enables better generalization to unseen wind patterns. The authors then employ Bayesian Optimization with a spatial-temporal Gaussian Process surrogate to efficiently search launch longitude, latitude, and time offsets, demonstrating faster convergence than PSO or uniform sampling and revealing that optimal launches can occur outside the target region. The work provides a foundation for data-efficient launch optimization in balloon-based applications and highlights wind-field variability as a critical factor in performance, with implications for ecological surveys, atmospheric analysis, and communications relays.

Abstract

Station-keeping tasks for high-altitude balloons show promise in areas such as ecological surveys, atmospheric analysis, and communication relays. However, identifying the optimal time and position to launch a latex high-altitude balloon is still a challenging and multifaceted problem. For example, tasks such as forest fire tracking place geometric constraints on the launch location of the balloon. Furthermore, identifying the most optimal location also heavily depends on atmospheric conditions. We first illustrate how reinforcement learning-based controllers, frequently used for station-keeping tasks, can exploit the environment. This exploitation can degrade performance on unseen weather patterns and affect station-keeping performance when identifying an optimal launch configuration. Valuing all states equally in the region, the agent exploits the region's geometry by flying near the edge, leading to risky behaviours. We propose a modification which compensates for this exploitation and finds this leads to, on average, higher steps within the target region on unseen data. Then, we illustrate how Bayesian Optimisation (BO) can identify the optimal launch location to perform station-keeping tasks, maximising the expected undiscounted return from a given rollout. We show BO can find this launch location in fewer steps compared to other optimisation methods. Results indicate that, surprisingly, the most optimal location to launch from is not commonly within the target region. Please find further information about our project at https://sites.google.com/view/bo-lauch-balloon/.

Identifying Optimal Launch Sites of High-Altitude Latex-Balloons using Bayesian Optimisation for the Task of Station-Keeping

TL;DR

Abstract

Paper Structure (17 sections, 11 equations, 7 figures, 2 tables, 1 algorithm)

This paper contains 17 sections, 11 equations, 7 figures, 2 tables, 1 algorithm.

Introduction
Literature Review
Background
Equations of Motion
Reinforcement Learning
Spatial-Temporal Gaussian Process Modelling
Bayesian Optimisation
Problem Statement
Method
Soft Actor-Critic Controller
Wind Data
Reward Function Evaluation
Launch Location Optimisation
Results
Reward Function Evaluation
...and 2 more sections

Figures (7)

Figure 1: Here, we illustrate our objective to identify a launch configuration that maximises the duration within the target region. Despite the challenges: of under-actuated navigation, the influence of wind and atmospheric conditions, and environment exploitation.
Figure 2: Block diagram of our method to identify the optimal launch configuration. Initially, we randomly sample a launch configuration which forms part of the initial state $x$ of the RL policy. After the episode has terminated, the expected undiscounted return is calculated and stored along with the launch configuration $\mathbf{x}$. Then, a GP models the performance of the policy by optimising the log marginal likelihood. The next launch configuration is chosen by maximising the expected improvement.
Figure 3: Training curve of the two policies, indicating performance converging to approximately 45 steps within region. Illustrating similar performance to the training data.
Figure 4: Kernel density estimate visualising the distribution of positions for both policies between 0 and 400 k. Indicating the worse generalisation for the Step function to unseen data.
Figure 5: Projected view of both policies trajectories given the same environment state with wind fields chosen from the test dataset. Where initial positions are chosen at the same longitude initial position with varying latitudes. The trajectories indicate the Tanh reward function incentivises the agent to fly closer to the target region. The policy trained on the Step function attempts to traverse the circumference of the region which leads to fewer steps within the region.
...and 2 more figures

Identifying Optimal Launch Sites of High-Altitude Latex-Balloons using Bayesian Optimisation for the Task of Station-Keeping

TL;DR

Abstract

Identifying Optimal Launch Sites of High-Altitude Latex-Balloons using Bayesian Optimisation for the Task of Station-Keeping

Authors

TL;DR

Abstract

Table of Contents

Figures (7)