Table of Contents
Fetching ...

Optimality Principles in Spacecraft Neural Guidance and Control

Dario Izzo, Emmanuel Blazquez, Robin Ferede, Sebastien Origer, Christophe De Wagter, Guido C. H. E. de Croon

Abstract

Spacecraft and drones aimed at exploring our solar system are designed to operate in conditions where the smart use of onboard resources is vital to the success or failure of the mission. Sensorimotor actions are thus often derived from high-level, quantifiable, optimality principles assigned to each task, utilizing consolidated tools in optimal control theory. The planned actions are derived on the ground and transferred onboard where controllers have the task of tracking the uploaded guidance profile. Here we argue that end-to-end neural guidance and control architectures (here called G&CNets) allow transferring onboard the burden of acting upon these optimality principles. In this way, the sensor information is transformed in real time into optimal plans thus increasing the mission autonomy and robustness. We discuss the main results obtained in training such neural architectures in simulation for interplanetary transfers, landings and close proximity operations, highlighting the successful learning of optimality principles by the neural model. We then suggest drone racing as an ideal gym environment to test these architectures on real robotic platforms, thus increasing confidence in their utilization on future space exploration missions. Drone racing shares with spacecraft missions both limited onboard computational capabilities and similar control structures induced from the optimality principle sought, but it also entails different levels of uncertainties and unmodelled effects. Furthermore, the success of G&CNets on extremely resource-restricted drones illustrates their potential to bring real-time optimal control within reach of a wider variety of robotic systems, both in space and on Earth.

Optimality Principles in Spacecraft Neural Guidance and Control

Abstract

Spacecraft and drones aimed at exploring our solar system are designed to operate in conditions where the smart use of onboard resources is vital to the success or failure of the mission. Sensorimotor actions are thus often derived from high-level, quantifiable, optimality principles assigned to each task, utilizing consolidated tools in optimal control theory. The planned actions are derived on the ground and transferred onboard where controllers have the task of tracking the uploaded guidance profile. Here we argue that end-to-end neural guidance and control architectures (here called G&CNets) allow transferring onboard the burden of acting upon these optimality principles. In this way, the sensor information is transformed in real time into optimal plans thus increasing the mission autonomy and robustness. We discuss the main results obtained in training such neural architectures in simulation for interplanetary transfers, landings and close proximity operations, highlighting the successful learning of optimality principles by the neural model. We then suggest drone racing as an ideal gym environment to test these architectures on real robotic platforms, thus increasing confidence in their utilization on future space exploration missions. Drone racing shares with spacecraft missions both limited onboard computational capabilities and similar control structures induced from the optimality principle sought, but it also entails different levels of uncertainties and unmodelled effects. Furthermore, the success of G&CNets on extremely resource-restricted drones illustrates their potential to bring real-time optimal control within reach of a wider variety of robotic systems, both in space and on Earth.
Paper Structure (8 sections, 6 figures)

This paper contains 8 sections, 6 figures.

Figures (6)

  • Figure 1: Optimality principles determine the decision-making during different phases of exploration missions.a) During an interplanetary phase, the spacecraft dynamics is well identified. Uncertainties are limited and the departure from a theoretical mass optimal guidance is of less importance also thanks to the relatively slow dynamics involved. b) During a landing phase, according to the specific mission profile, the adaptiveness and robustness of the planned actions have a larger impact on the mission success, also considering that human operators are typically too far away to allow re-planning within an acceptable timeframe. c) During a planetary exploration phase (e.g. rovers or flying drones) uncertainties are larger and optimality principles such as careful use of available onboard energy need to be embedded into highly disturbed and fast dynamics.
  • Figure 2: G&CNets have a similar role to Model Predictive Control in the architecture of an autonomous mission.A) MPC iteratively solves on-board optimal control problems predicting state and actions over a defined time horizon, based on an existing system model. This results in possible optimality guarantees with full predictive information, at the expense of significant on-board computational burden determined by the complexity of the system model and optimal control problem to be solved. B) A G&CNet inference directly transforms the system state into actions. B.1) When trained using RL, an agent learns from experience the final probabilistic policy, based on a critical reward-feedback loop with the environment. The resulting architecture can be resilient to stochastic disturbances but is often based on engineered reward functions that depart from the original optimality principle assigned. B.2) When trained via supervised learning the network captures a clear optimality principle in its structure, directly inferring optimal actions from the state feedback at high frequency. Such a solution allows for fast direct inference with limited hardware requirements and is possibly subject to instability and lack of robustness when the state falls outside the set used to train the network.
  • Figure 3: Challenges in approximating optimal feedback with a G&CNet.a) Optimal control tasks can have very high-dimensional solutions. Already in simple precession control, a complex structure emerges. In this case, the task is to lead in the shortest possible time, a precessing satellite to a uniform rotation around its symmetry axis, thus canceling the components $\omega_x, \omega_y$ of its angular velocity. The resulting deterministic optimal control problem is one rare case where an analytical solution can be derived, allowing us to peek into the structure of the optimal policy over the entire state space. According to the values of $\omega_x, \omega_y$ the thrusters are switching direction in correspondence of a complex and discontinuous switching line. Resulting time-optimal trajectories are shown in color.b) The optimality principle pursued affects the resulting control profile and its gradient. Here the optimal control commands from energy-optimal to time-optimal quadcopter landings are shown origer2023guidance. c) Smooth control profiles result in smaller errors compared to non-smooth bang-bang profiles when approximated by a G&CNET.
  • Figure 4: The backward generation of optimal examples (BGOE) technique, allows to generate orders of magnitude larger datasets by perturbing one nominal solution.a) The nominal solution for the case of a time-optimal transfer from the asteroid belt to Earth (visualized in a rotating frame). b) Two bundles of 200,000 optimal trajectories found applying BGOE to the nominal solution. Larger perturbations of the nominal trajectory result in better coverage of conditions close to the Earth (short bundle), which reduces the likelihood that the spacecraft lands outside of the training data IZZO2022. For comparison, the generation of all 400,000 trajectories uses the same numerical resources used to generate one nominal optimal solution.
  • Figure 5: G&CNets robustness to model mismatch. Two examples of real autonomous flights of a Parrot™ AR drone 2.0 in the TU Delft Cyberzoo: a) Unmodeled moments are detected in real-time on-board and fed back to the G&CNet. b) Unexpected early saturation of the motor revolutions per minute is present. The saturation is estimated on-board and fed back to the G&CNet. In both cases, the improvements with respect to the non-parametric version are evident. The initial drone position is also shown with a white border.
  • ...and 1 more figures