Few-shot Scooping Under Domain Shift via Simulated Maximal Deployment Gaps

Yifan Zhu; Pranay Thangeda; Erica L Tevere; Ashish Goel; Erik Kramer; Hari D Nayar; Melkior Ornik; Kris Hauser

Few-shot Scooping Under Domain Shift via Simulated Maximal Deployment Gaps

Yifan Zhu, Pranay Thangeda, Erica L Tevere, Ashish Goel, Erik Kramer, Hari D Nayar, Melkior Ornik, Kris Hauser

TL;DR

This work tackles autonomous extraterrestrial terrain sampling under large domain shifts by formulating a few-shot scooping problem and introducing a vision-based adaptive strategy built on a deep kernel Gaussian process (GP). The core contribution, Deep Kernel Calibration with Maximal Deployment Gaps (kCMD), trains kernels to handle maximal simulated deployment gaps by OT-based task splitting, and integrates this with Bayesian optimization to rapidly adapt to novel terrains with limited experience. Extensive validation includes 5,100 offline scoops across UIUC terrains and zero-shot transfer to NASA OWLAT, where kCMD outperforms non-adaptive baselines and shows strong generalization to out-of-distribution materials. These results demonstrate the potential of training high-capacity models with simulated deployment gaps for robust meta-learning in robotic sampling and underline its applicability to autonomous lander missions facing Earth-to-space deployment gaps.

Abstract

Autonomous lander missions on extraterrestrial bodies need to sample granular materials while coping with domain shifts, even when sampling strategies are extensively tuned on Earth. To tackle this challenge, this paper studies the few-shot scooping problem and proposes a vision-based adaptive scooping strategy that uses the deep kernel Gaussian process method trained with a novel meta-training strategy to learn online from very limited experience on out-of-distribution target terrains. Our Deep Kernel Calibration with Maximal Deployment Gaps (kCMD) strategy explicitly trains a deep kernel model to adapt to large domain shifts by creating simulated maximal deployment gaps from an offline training dataset and training models to overcome these deployment gaps during training. Employed in a Bayesian Optimization sequential decision-making framework, the proposed method allows the robot to perform high-quality scooping actions on out-of-distribution terrains after a few attempts, significantly outperforming non-adaptive methods proposed in the excavation literature as well as other state-of-the-art meta-learning methods. The proposed method also demonstrates zero-shot transfer capability, successfully adapting to the NASA OWLAT platform, which serves as a state-of-the-art simulator for potential future planetary missions. These results demonstrate the potential of training deep models with simulated deployment gaps for more generalizable meta-learning in high-capacity models. Furthermore, they highlight the promise of our method in autonomous lander sampling missions by enabling landers to overcome the deployment gap between Earth and extraterrestrial bodies.

Few-shot Scooping Under Domain Shift via Simulated Maximal Deployment Gaps

TL;DR

Abstract

Paper Structure (10 sections, 6 equations, 8 figures, 1 algorithm)

This paper contains 10 sections, 6 equations, 8 figures, 1 algorithm.

One-Sentence Summary:
Simulated Experiments
Physical Experiments on the UIUC testbed
Physical Experiments on OWLAT
Problem Formulation
Scooping Setups
Deep Kernel Gaussian Process Model
Deep Kernel Calibration with Maximal Deployment Gaps
Bayesian optimization decision-maker
Model Training

Figures (8)

Figure 1: Concept illustration. A lander whose sampling policy is trained and tuned on Earth may degrade or even completely fail when deployed on an extraterrestrial planet with drastically different terrain properties (pictures by courtesy of NASA). To overcome such a challenge, our scooping policy is trained to be adaptive to novel terrains on a large offline dataset collected on the UIUC testbed and evaluated on novel terrains in the same testbed. The policy is then deployed, without retraining, on the NASA OWLAT platform with its novel terrains, where the policy quickly adapts, achieving high scooping volumes in just a few attempts.
Figure 2: Method overview. Our proposed deep kernel model is trained on a diverse offline database with kCMD, which repeatedly splits the training set into mean-training and kernel-training and learns kernel parameters to minimize the residuals from the mean models. Mean- and kernel-training splits are achieved by randomly selecting a reference task, calculating the pairwise task distance between it and every other task (the gray-scale color represents the distance), and splitting based on the median distance. The task distance is the optimal transport alvarez2020geometric between the tasks based on each task's data samples. In deployment, the decision-maker uses the trained model and adapts it to the data acquired online (support set).
Figure 3: All training and testing materials and compositions along with example terrains illustrating different compositions, materials, and topography, on the UIUC testbed. Note that the Partition composition might not necessarily be half/half splits. Blue labels indicate approximate grain sizes where applicable. US quarter coin provided for scale.
Figure 4: Quantitative results for simulated and physical experiments on both the UIUC testbed and OWLAT. (A) Simulated rollout results, where the average and max attempts to achieve success volume threshold are reported. (B) Simulated prediction accuracy MAE results for different shots on all testing terrains. (C) Physical rollout results. The allowed number of attempts is capped at 20 in order to control the experiment time. Experiments that failed at 20 attempts are denoted with $\times$. (D) On OWLAT, the average volume was collected for different methods with 5 attempts. For all experiments, the average across models trained with three random seeds is reported.
Figure 5: Example physical trials comparing our method and baselines. Terrain 1 is Layers with Packing Peanuts over Slates on the left and Shredded Cardboard on the right. Terrain 2 is Single with Rock. The predicted scores by each model are visualized, with the regions that would result in the robot colliding with the terrain tray masked. The action taken and the resulting volume are shown with arrows. The volume threshold $B$ and trial success are also labeled. For each trial, the first, final, and some intermediate (if they exist) attempts are visualized, with the attempt number shown in the top right corner. For terrain 2, RGB-D patches are also visualized for more details (patches are oriented along the scooping direction, with the left edge corresponding to the edge near scoop's starting location). kCMD and iMAML are able to quickly adapt to scoop the more scoopable shredded cardboard on terrain 1, while SL and CNP struggle. On terrain 2, iMAML and DKMT, however, correlate samples incorrectly and predict low scores for promising locations very quickly, where the ideal location allows the scoop to stick into a gap between rock pieces to avoid jamming, and contains a big piece of rock in the direction of the scoop motion.
...and 3 more figures

Few-shot Scooping Under Domain Shift via Simulated Maximal Deployment Gaps

TL;DR

Abstract

Few-shot Scooping Under Domain Shift via Simulated Maximal Deployment Gaps

Authors

TL;DR

Abstract

Table of Contents

Figures (8)