Table of Contents
Fetching ...

Diffusion Stabilizer Policy for Automated Surgical Robot Manipulations

Chonlam Ho, Jianshu Hu, Hesheng Wang, Qi Dou, Yutong Ban

TL;DR

This work addresses learning surgical robot manipulation policies from imperfect demonstrations by introducing the Diffusion Stabilizer Policy (DSP). DSP uses a two-stage approach: first training a diffusion stabilizer on clean data, then updating it with a mixture of clean and perturbed data filtered by action-prediction error, with a threshold gamma set as the mean data error. The key contributions include a diffusion-based filter that can discard perturbed samples during training and online/offline filtering strategies, leading to strong performance on the SurRoL platform and robustness to data perturbations, including long-horizon tasks like BiPegTransfer. The proposed method has practical significance in enabling data-intensive diffusion policies in surgical robotics by leveraging imperfect demonstrations to scale data collection while maintaining high reliability.

Abstract

Intelligent surgical robots have the potential to revolutionize clinical practice by enabling more precise and automated surgical procedures. However, the automation of such robot for surgical tasks remains under-explored compared to recent advancements in solving household manipulation tasks. These successes have been largely driven by (1) advanced models, such as transformers and diffusion models, and (2) large-scale data utilization. Aiming to extend these successes to the domain of surgical robotics, we propose a diffusion-based policy learning framework, called Diffusion Stabilizer Policy (DSP), which enables training with imperfect or even failed trajectories. Our approach consists of two stages: first, we train the diffusion stabilizer policy using only clean data. Then, the policy is continuously updated using a mixture of clean and perturbed data, with filtering based on the prediction error on actions. Comprehensive experiments conducted in various surgical environments demonstrate the superior performance of our method in perturbation-free settings and its robustness when handling perturbed demonstrations.

Diffusion Stabilizer Policy for Automated Surgical Robot Manipulations

TL;DR

This work addresses learning surgical robot manipulation policies from imperfect demonstrations by introducing the Diffusion Stabilizer Policy (DSP). DSP uses a two-stage approach: first training a diffusion stabilizer on clean data, then updating it with a mixture of clean and perturbed data filtered by action-prediction error, with a threshold gamma set as the mean data error. The key contributions include a diffusion-based filter that can discard perturbed samples during training and online/offline filtering strategies, leading to strong performance on the SurRoL platform and robustness to data perturbations, including long-horizon tasks like BiPegTransfer. The proposed method has practical significance in enabling data-intensive diffusion policies in surgical robotics by leveraging imperfect demonstrations to scale data collection while maintaining high reliability.

Abstract

Intelligent surgical robots have the potential to revolutionize clinical practice by enabling more precise and automated surgical procedures. However, the automation of such robot for surgical tasks remains under-explored compared to recent advancements in solving household manipulation tasks. These successes have been largely driven by (1) advanced models, such as transformers and diffusion models, and (2) large-scale data utilization. Aiming to extend these successes to the domain of surgical robotics, we propose a diffusion-based policy learning framework, called Diffusion Stabilizer Policy (DSP), which enables training with imperfect or even failed trajectories. Our approach consists of two stages: first, we train the diffusion stabilizer policy using only clean data. Then, the policy is continuously updated using a mixture of clean and perturbed data, with filtering based on the prediction error on actions. Comprehensive experiments conducted in various surgical environments demonstrate the superior performance of our method in perturbation-free settings and its robustness when handling perturbed demonstrations.

Paper Structure

This paper contains 17 sections, 6 equations, 5 figures, 3 tables.

Figures (5)

  • Figure 1: Overview. Our diffuion-based policy learning framework learns a diffusion stabilizer which can filter the perturbed data.
  • Figure 2: The overall training framework of Diffusion Stablizer Policy. Our diffusion-based policy learning framework first trains a diffusion stabilizer policy with only clean data. The mixed batch of clean and perturbed data is filtered by the diffusion policy according to the error between predicted actions and the actions from the mixed dataset. The diffusion policy is continuously updated with the filtered data.
  • Figure 3: The perturbed samples generated during data collection and their filtering results are recorded. The left figure illustrates the recall of the predictions, representing the percentage of correctly identified perturbed data. The right figure depicts the accuracy, indicating the percentage of correctly classified samples across the entire dataset.
  • Figure 4: We test the performance of our method under different perturbation settings. "$\sigma$" is the mean of the added Gaussian noise and "steps" corresponds to the number of actions being perturbed.
  • Figure 5: A new threshold using the empirical mean minus the empirical standard deviation of the error is evaluated, labeled as "st_online" in the figure. Our method demonstrates overall robustness to the chioce of threshold.