Table of Contents
Fetching ...

NeRF-based Visualization of 3D Cues Supporting Data-Driven Spacecraft Pose Estimation

Antoine Legrand, Renaud Detry, Christophe De Vleeschouwer

TL;DR

This work tackles the explainability challenge of data-driven spacecraft pose estimation for autonomous on-orbit proximity operations. It introduces a NeRF-based image generator $G_{\Phi}$ conditioned on the 6D pose $(q,t)$ and trained by backpropagating through the pose estimator $P_{\Theta}$ to reveal the 3D cues the network relies on. The approach demonstrates that the recovered cues—such as edges and pose-relevant singularities like antennas—are sufficient for accurate pose inference and provides insights into how multi-task supervision affects robustness and generalization. Overall, the method enhances interpretability and trust in pose-estimation networks, facilitating safer deployment in space missions.

Abstract

On-orbit operations require the estimation of the relative 6D pose, i.e., position and orientation, between a chaser spacecraft and its target. While data-driven spacecraft pose estimation methods have been developed, their adoption in real missions is hampered by the lack of understanding of their decision process. This paper presents a method to visualize the 3D visual cues on which a given pose estimator relies. For this purpose, we train a NeRF-based image generator using the gradients back-propagated through the pose estimation network. This enforces the generator to render the main 3D features exploited by the spacecraft pose estimation network. Experiments demonstrate that our method recovers the relevant 3D cues. Furthermore, they offer additional insights on the relationship between the pose estimation network supervision and its implicit representation of the target spacecraft.

NeRF-based Visualization of 3D Cues Supporting Data-Driven Spacecraft Pose Estimation

TL;DR

This work tackles the explainability challenge of data-driven spacecraft pose estimation for autonomous on-orbit proximity operations. It introduces a NeRF-based image generator conditioned on the 6D pose and trained by backpropagating through the pose estimator to reveal the 3D cues the network relies on. The approach demonstrates that the recovered cues—such as edges and pose-relevant singularities like antennas—are sufficient for accurate pose inference and provides insights into how multi-task supervision affects robustness and generalization. Overall, the method enhances interpretability and trust in pose-estimation networks, facilitating safer deployment in space missions.

Abstract

On-orbit operations require the estimation of the relative 6D pose, i.e., position and orientation, between a chaser spacecraft and its target. While data-driven spacecraft pose estimation methods have been developed, their adoption in real missions is hampered by the lack of understanding of their decision process. This paper presents a method to visualize the 3D visual cues on which a given pose estimator relies. For this purpose, we train a NeRF-based image generator using the gradients back-propagated through the pose estimation network. This enforces the generator to render the main 3D features exploited by the spacecraft pose estimation network. Experiments demonstrate that our method recovers the relevant 3D cues. Furthermore, they offer additional insights on the relationship between the pose estimation network supervision and its implicit representation of the target spacecraft.

Paper Structure

This paper contains 10 sections, 1 equation, 2 figures, 2 tables, 1 algorithm.

Figures (2)

  • Figure 1: To visualize the 3D cues exploited by a spacecraft pose estimation network $P_{\Theta}$, our method relies on an image generator $G_{\Phi}$ which takes as input a 6D pose and outputs an image. By back-propagating the difference between the pose predicted on that image (by the frozen pose estimator) and the input pose, the generator is trained. Once it has been trained, the generator can synthesize images containing the 3D cues on which the pose estimation network primarily relies.
  • Figure 2: (Left:) NeRF rendering pipeline. (i): Rays, denoted by their origin $c$ and unit direction $d$, are projected through every pixel ($i$,$j$) of a calibrated camera of relative pose ($q$,$t$). (ii) $N$ points, i.e., a point being defined by a 3D position ($x$,$y$,$z$) and two viewing angles ($\theta$, $\phi$), are sampled along each ray. (iii) For each point, its color ($r$,$g$,$b$) and density ($\sigma$) are predicted by the neural field $\mathcal{F}$. (iv) For each ray, the value of the corresponding pixel is determined through differentiable ray-tracing techniques. (Right): Architecture of the neural field $\mathcal{F}$. The input position and viewing angles of a point are mapped to position and direction features, $F_{\textnormal{pos}}$ and $F_{\textnormal{dir}}$, respectively through a learnable 3D grid fridovich2023k and spherical harmonics. The position features are fed in a MLP which approximates the density field to output the density $\sigma$ of the point along density features $F_\sigma$. Density and direction features are processed by a second MLP to predict the color of the point.