Leveraging Neural Radiance Fields for Pose Estimation of an Unknown Space Object during Proximity Operations
Antoine Legrand, Renaud Detry, Christophe De Vleeschouwer
TL;DR
This work tackles autonomous rendezvous with unknown spacecraft targets by enabling off-the-shelf 6D pose estimation trained without a CAD model. It leverages an in-the-wild Neural Radiance Field with learnable appearance embeddings to synthesize a large, illumination-diverse training set from a sparse on-ground image collection, which is then used to train a model-based SPE network. On SPEED+ Hardware-In-The-Loop data, the NeRF-generated training approach outperforms baselines trained on few real images and approaches, or matches, CAD-based synthetic data in pose accuracy. The method demonstrates a CAD-free, model-agnostic path to robust on-board pose estimation for proximity operations, with the illumination-diversity strategy via appearance embeddings being a critical factor.
Abstract
We address the estimation of the 6D pose of an unknown target spacecraft relative to a monocular camera, a key step towards the autonomous rendezvous and proximity operations required by future Active Debris Removal missions. We present a novel method that enables an "off-the-shelf" spacecraft pose estimator, which is supposed to known the target CAD model, to be applied on an unknown target. Our method relies on an in-the wild NeRF, i.e., a Neural Radiance Field that employs learnable appearance embeddings to represent varying illumination conditions found in natural scenes. We train the NeRF model using a sparse collection of images that depict the target, and in turn generate a large dataset that is diverse both in terms of viewpoint and illumination. This dataset is then used to train the pose estimation network. We validate our method on the Hardware-In-the-Loop images of SPEED+ that emulate lighting conditions close to those encountered on orbit. We demonstrate that our method successfully enables the training of an off-the-shelf spacecraft pose estimation network from a sparse set of images. Furthermore, we show that a network trained using our method performs similarly to a model trained on synthetic images generated using the CAD model of the target.
