Table of Contents
Fetching ...

Invisible Servoing: a Visual Servoing Approach with Return-Conditioned Latent Diffusion

Bishoy Gerges, Barbara Bazzana, Nicolò Botteghi, Youssef Aboudorra, Antonio Franchi

TL;DR

This work addresses UAV visual servoing under target-invisibility, where conventional VS methods fail when the target is occluded or out of view. It proposes a latent-diffusion framework that operates in a compact latent space learned by a Cross-Modal Variational Autoencoder (CM-VAE) and uses return-conditioned latent DDPMs to generate trajectories toward a target view; a dedicated return Estimation heuristic ties planning to feasible, smooth motion. The approach is validated in Gazebo simulations with a quadrotor and a hexarotor, demonstrating recovery of the target view and successful visuospatial alignment despite initial invisibility, and showing improved tracking in closed-loop receding-horizon experiments. The proposed combination of CM-VAE, latent DDPM planning, and return-conditioned control offers a robust alternative to feature-based VS, with potential for end-to-end real-world deployment and MPC integration.

Abstract

In this paper, we present a novel visual servoing (VS) approach based on latent Denoising Diffusion Probabilistic Models (DDPMs), that explores the application of generative models for vision-based navigation of UAVs (Uncrewed Aerial Vehicles). Opposite to classical VS methods, the proposed approach allows reaching the desired target view, even when the target is initially not visible. This is possible thanks to the learning of a latent representation that the DDPM uses for planning and a dataset of trajectories encompassing target-invisible initial views. A compact representation is learned from raw images using a Cross-Modal Variational Autoencoder. Given the current image, the DDPM generates trajectories in the latent space driving the robotic platform to the desired visual target. The approach has been validated in simulation using two generic multi-rotor UAVs (a quadrotor and a hexarotor). The results show that we can successfully reach the visual target, even if not visible in the initial view.

Invisible Servoing: a Visual Servoing Approach with Return-Conditioned Latent Diffusion

TL;DR

This work addresses UAV visual servoing under target-invisibility, where conventional VS methods fail when the target is occluded or out of view. It proposes a latent-diffusion framework that operates in a compact latent space learned by a Cross-Modal Variational Autoencoder (CM-VAE) and uses return-conditioned latent DDPMs to generate trajectories toward a target view; a dedicated return Estimation heuristic ties planning to feasible, smooth motion. The approach is validated in Gazebo simulations with a quadrotor and a hexarotor, demonstrating recovery of the target view and successful visuospatial alignment despite initial invisibility, and showing improved tracking in closed-loop receding-horizon experiments. The proposed combination of CM-VAE, latent DDPM planning, and return-conditioned control offers a robust alternative to feature-based VS, with potential for end-to-end real-world deployment and MPC integration.

Abstract

In this paper, we present a novel visual servoing (VS) approach based on latent Denoising Diffusion Probabilistic Models (DDPMs), that explores the application of generative models for vision-based navigation of UAVs (Uncrewed Aerial Vehicles). Opposite to classical VS methods, the proposed approach allows reaching the desired target view, even when the target is initially not visible. This is possible thanks to the learning of a latent representation that the DDPM uses for planning and a dataset of trajectories encompassing target-invisible initial views. A compact representation is learned from raw images using a Cross-Modal Variational Autoencoder. Given the current image, the DDPM generates trajectories in the latent space driving the robotic platform to the desired visual target. The approach has been validated in simulation using two generic multi-rotor UAVs (a quadrotor and a hexarotor). The results show that we can successfully reach the visual target, even if not visible in the initial view.
Paper Structure (13 sections, 14 equations, 9 figures, 1 table)

This paper contains 13 sections, 14 equations, 9 figures, 1 table.

Figures (9)

  • Figure 1: An overview of the maneuver generated by the proposed approach to steer the UAV to the desired visual target, using the images of the onboard camera.
  • Figure 2: Architecture of the latent diffusion VS approach.
  • Figure 3: VS with latent DDPMs. We encode the initial frame $\mathbf{x}^0$ and the target frame $\mathbf{x}^N$ using the CM-VAE encoder. Their respective latent representations $\mathbf{z}^0$ and $\mathbf{z}^N$ are inpainted as initial and target latent states of the trajectory generated by the latent DDPM.
  • Figure 4: Linear regression model prediction of the velocity.
  • Figure 5: Testing environments.
  • ...and 4 more figures