Table of Contents
Fetching ...

Exploring the Generalizability of Geomagnetic Navigation: A Deep Reinforcement Learning approach with Policy Distillation

Wenqi Bai, Shiliang Zhang, Xiaohui Zhang, Xuehui Ma, Songnan Yang, Yushuai Li, Tingwen Huang

TL;DR

This work tackles the challenge of generalizing geomagnetic navigation policies beyond the region where they are trained. It introduces TD3-STEPD, a deep reinforcement learning framework that learns multiple teacher policies in different regions and distills them into a single generalized student policy, complemented by reward shaping that blends extrinsic and intrinsic signals. The approach demonstrates effective cross-domain transfer, achieving high success rates and smoother trajectories in unknown regions while outperforming evolutionary-method baselines. The results highlight the potential for robust, GPS-free navigation in large, unexplored areas and point to future work on handling geomagnetic anomalies and dynamic field variations.

Abstract

The advancement in autonomous vehicles has empowered navigation and exploration in unknown environments. Geomagnetic navigation for autonomous vehicles has drawn increasing attention with its independence from GPS or inertial navigation devices. While geomagnetic navigation approaches have been extensively investigated, the generalizability of learned geomagnetic navigation strategies remains unexplored. The performance of a learned strategy can degrade outside of its source domain where the strategy is learned, due to a lack of knowledge about the geomagnetic characteristics in newly entered areas. This paper explores the generalization of learned geomagnetic navigation strategies via deep reinforcement learning (DRL). Particularly, we employ DRL agents to learn multiple teacher models from distributed domains that represent dispersed navigation strategies, and amalgamate the teacher models for generalizability across navigation areas. We design a reward shaping mechanism in training teacher models where we integrate both potential-based and intrinsic-motivated rewards. The designed reward shaping can enhance the exploration efficiency of the DRL agent and improve the representation of the teacher models. Upon the gained teacher models, we employ multi-teacher policy distillation to merge the policies learned by individual teachers, leading to a navigation strategy with generalizability across navigation domains. We conduct numerical simulations, and the results demonstrate an effective transfer of the learned DRL model from a source domain to new navigation areas. Compared to existing evolutionary-based geomagnetic navigation methods, our approach provides superior performance in terms of navigation length, duration, heading deviation, and success rate in cross-domain navigation.

Exploring the Generalizability of Geomagnetic Navigation: A Deep Reinforcement Learning approach with Policy Distillation

TL;DR

This work tackles the challenge of generalizing geomagnetic navigation policies beyond the region where they are trained. It introduces TD3-STEPD, a deep reinforcement learning framework that learns multiple teacher policies in different regions and distills them into a single generalized student policy, complemented by reward shaping that blends extrinsic and intrinsic signals. The approach demonstrates effective cross-domain transfer, achieving high success rates and smoother trajectories in unknown regions while outperforming evolutionary-method baselines. The results highlight the potential for robust, GPS-free navigation in large, unexplored areas and point to future work on handling geomagnetic anomalies and dynamic field variations.

Abstract

The advancement in autonomous vehicles has empowered navigation and exploration in unknown environments. Geomagnetic navigation for autonomous vehicles has drawn increasing attention with its independence from GPS or inertial navigation devices. While geomagnetic navigation approaches have been extensively investigated, the generalizability of learned geomagnetic navigation strategies remains unexplored. The performance of a learned strategy can degrade outside of its source domain where the strategy is learned, due to a lack of knowledge about the geomagnetic characteristics in newly entered areas. This paper explores the generalization of learned geomagnetic navigation strategies via deep reinforcement learning (DRL). Particularly, we employ DRL agents to learn multiple teacher models from distributed domains that represent dispersed navigation strategies, and amalgamate the teacher models for generalizability across navigation areas. We design a reward shaping mechanism in training teacher models where we integrate both potential-based and intrinsic-motivated rewards. The designed reward shaping can enhance the exploration efficiency of the DRL agent and improve the representation of the teacher models. Upon the gained teacher models, we employ multi-teacher policy distillation to merge the policies learned by individual teachers, leading to a navigation strategy with generalizability across navigation domains. We conduct numerical simulations, and the results demonstrate an effective transfer of the learned DRL model from a source domain to new navigation areas. Compared to existing evolutionary-based geomagnetic navigation methods, our approach provides superior performance in terms of navigation length, duration, heading deviation, and success rate in cross-domain navigation.

Paper Structure

This paper contains 19 sections, 22 equations, 6 figures, 2 tables.

Figures (6)

  • Figure 1: Description of the geomagnetic field and geomagnetic parameters.
  • Figure 2: This framework consists of teacher networks and a student network. The top part demonstrates the k teacher policy networks, independently trained in different regions. The orange boxes on the map indicate the training regions of the k teacher policy networks. The bottom part shows how policy distillation integrates the navigation strategies from the k teacher policy networks into the student policy network, thereby extending the navigation domain to a larger region as indicated by the green box on the map.
  • Figure 3: Total magnetic field intensity in the selected simulation region. The red boxes indicate Regions A, B, C, and D, which are used as training regions for the agent. The remaining region are used as unknown regions to validate the generalization of the navigation methods.
  • Figure 4: Reward curves for the TD3-ST algorithm and two ablation methods, TD3-SR and TD3-ER, trained in four regions: (a) Region A: Longitude 90 to 95, Latitude -15 to -10, (b) Region B: Longitude 130 to 135, Latitude -15 to -10, (c) Region C: Longitude 90 to 95, Latitude -35 to -30, and (d) Region D: Longitude 130 to 135, Latitude -35 to -30. Each subfigure displays the convergence behavior of the algorithms, with both raw and averaged episode rewards.
  • Figure 5: Statistical results for the comparison of four evaluation metrics between basic TD3-ST and proposed TD3-STEPD algorithms in 1000 long-distance navigation tasks within unknown regions. Each subplot corresponds to a specific evaluation metric: (a) Absolute Mean Error of Heading Deviation, (b) Root Mean Square Error of Heading Deviation, (c) Navigation Error and (d) Total Navigation Time. In each subplot, the central mark indicates the median, and the bottom and top edges of the box indicate the 25th and 75th percentiles, respectively. The whiskers extend to show the range of the data excluding outliers, which are represented by individual points beyond the whiskers.
  • ...and 1 more figures