Viewpoint-Agnostic Manipulation Policies with Strategic Vantage Selection
Sreevishakh Vasudevan, Som Sagar, Ransalu Senanayake
TL;DR
The paper tackles the brittleness of vision-guided manipulation policies to camera viewpoint changes. It introduces Vantage, a viewpoint-selection framework that uses Bayesian optimization with a Gaussian-process surrogate to pick a small set of informative training viewpoints for fine-tuning. The method provides sublinear regret and robustness guarantees, with large empirical gains across simulated and real-world tasks and policy families, including diffusion policies. Real-robot experiments confirm sim-to-real viability and demonstrate substantial performance improvements with limited fine-tuning budget.
Abstract
Since vision-based manipulation policies are typically trained from data gathered from a single viewpoint, their performance drops when the view changes during deployment. Naively aggregating demonstrations from numerous random views is not only costly but also known to destabilize learning, as excessive visual diversity acts as noise. We present Vantage, a viewpoint selection framework to fine-tune any pre-trained policy on a small, strategically set of camera poses to induce viewpoint-agnostic behavior. Instead of relying on costly brute-force search over viewpoints, Vantage formulates camera placement as an information gain optimization problem in a continuous space. This approach balances exploration of novel poses with exploitation of promising ones, while also providing theoretical guarantees about convergence and robustness. Across manipulation tasks and policy families, Vantage consistently improves success under viewpoint shifts compared to fixed, grid, or random data selection strategies with only a handful of fine-tuning steps. Experiments conducted on simulated and real-world setups show that Vantage increases the task success rate by 25% for diffusion policies, and yields robust gains in dynamic-camera settings.
