Table of Contents
Fetching ...

Trust-Region Neural Moving Horizon Estimation for Robots

Bingheng Wang, Xuyang Chen, Lin Zhao

TL;DR

This paper proposes a trust-region policy optimization method for training NeuroMHE by providing the second-order derivatives of MHE, referred to as the MHE Hessian, and shows enhanced robustness to network initialization compared to the gradient descent counterpart.

Abstract

Accurate disturbance estimation is essential for safe robot operations. The recently proposed neural moving horizon estimation (NeuroMHE), which uses a portable neural network to model the MHE's weightings, has shown promise in further pushing the accuracy and efficiency boundary. Currently, NeuroMHE is trained through gradient descent, with its gradient computed recursively using a Kalman filter. This paper proposes a trust-region policy optimization method for training NeuroMHE. We achieve this by providing the second-order derivatives of MHE, referred to as the MHE Hessian. Remarkably, we show that much of computation already used to obtain the gradient, especially the Kalman filter, can be efficiently reused to compute the MHE Hessian. This offers linear computational complexity relative to the MHE horizon. As a case study, we evaluate the proposed trust region NeuroMHE on real quadrotor flight data for disturbance estimation. Our approach demonstrates highly efficient training in under 5 min using only 100 data points. It outperforms a state-of-the-art neural estimator by up to 68.1% in force estimation accuracy, utilizing only 1.4% of its network parameters. Furthermore, our method showcases enhanced robustness to network initialization compared to the gradient descent counterpart.

Trust-Region Neural Moving Horizon Estimation for Robots

TL;DR

This paper proposes a trust-region policy optimization method for training NeuroMHE by providing the second-order derivatives of MHE, referred to as the MHE Hessian, and shows enhanced robustness to network initialization compared to the gradient descent counterpart.

Abstract

Accurate disturbance estimation is essential for safe robot operations. The recently proposed neural moving horizon estimation (NeuroMHE), which uses a portable neural network to model the MHE's weightings, has shown promise in further pushing the accuracy and efficiency boundary. Currently, NeuroMHE is trained through gradient descent, with its gradient computed recursively using a Kalman filter. This paper proposes a trust-region policy optimization method for training NeuroMHE. We achieve this by providing the second-order derivatives of MHE, referred to as the MHE Hessian. Remarkably, we show that much of computation already used to obtain the gradient, especially the Kalman filter, can be efficiently reused to compute the MHE Hessian. This offers linear computational complexity relative to the MHE horizon. As a case study, we evaluate the proposed trust region NeuroMHE on real quadrotor flight data for disturbance estimation. Our approach demonstrates highly efficient training in under 5 min using only 100 data points. It outperforms a state-of-the-art neural estimator by up to 68.1% in force estimation accuracy, utilizing only 1.4% of its network parameters. Furthermore, our method showcases enhanced robustness to network initialization compared to the gradient descent counterpart.
Paper Structure (11 sections, 22 equations, 3 figures, 2 tables, 2 algorithms)

This paper contains 11 sections, 22 equations, 3 figures, 2 tables, 2 algorithms.

Figures (3)

  • Figure 1: Learning pipelines of the trust-region NeuroMHE. Currently, NeuroMHE is trained via gradient descent with its gradient computed recursively using a Kalman filter. This paper enhances NeuroMHE training with the second-order trust-region method. Interestingly, we show that the MHE Hessian can be obtained recursively using the same Kalman filter with just minor modifications to its inputs.
  • Figure 2: Comparison of the training performance between the gradient descent (GD) method and the proposed trust-region (TR) method. We randomly initialize the neural network using the Kaiming method he2015delving in $10$ trials. For gradient descent, we set the learning rate to $1\times 10^{-4}$ by balancing between training stability and performance. We carefully select the trials of gradient descent such that the untrained mean loss closely matches that of the trust-region method, enabling fair comparisons. The ultimate loss shown in Fig. \ref{['fig:steady mean loss']} corresponds to the loss value in the last episode.
  • Figure 3: Comparison of the force estimation performance between NeuroMHE and NeuroBEM on an aggressive Figure-8 flight test dataset as used in bauersfeld2021neurobem.