SurgicAI: A Hierarchical Platform for Fine-Grained Surgical Policy Learning and Benchmarking

Jin Wu; Haoying Zhou; Peter Kazanzides; Adnan Munawar; Anqi Liu

SurgicAI: A Hierarchical Platform for Fine-Grained Surgical Policy Learning and Benchmarking

Jin Wu, Haoying Zhou, Peter Kazanzides, Adnan Munawar, Anqi Liu

TL;DR

SurgicAI addresses the challenge of automating complex robotic suturing by providing a deformable-thread, AMBF-based simulation platform compatible with the da Vinci system, paired with a standardized data pipeline and benchmark suite. Its core innovation is a hierarchical learning framework comprising High-Level and Low-Level policies to manage multi-stage suturing tasks, enabling reusable subskills and scalable policy learning. Empirical results show that pure online RL struggles under sparse rewards, while integrating expert demonstrations via imitation learning or hybrid RL-IL approaches (e.g., TD3+HER+BC) yields higher success rates and more efficient trajectories; offline methods also perform well with dense rewards. The platform’s modularity, extensive data collection capabilities, and open maintenance pipeline position SurgicAI as a practical tool to advance policy learning in surgical robotics and to bridge simulation-to-real transfer, with future plans for more realistic rendering, broader algorithms, and broader collaboration.

Abstract

Despite advancements in robotic-assisted surgery, automating complex tasks like suturing remain challenging due to the need for adaptability and precision. Learning-based approaches, particularly reinforcement learning (RL) and imitation learning (IL), require realistic simulation environments for efficient data collection. However, current platforms often include only relatively simple, non-dexterous manipulations and lack the flexibility required for effective learning and generalization. We introduce SurgicAI, a novel platform for development and benchmarking addressing these challenges by providing the flexibility to accommodate both modular subtasks and more importantly task decomposition in RL-based surgical robotics. Compatible with the da Vinci Surgical System, SurgicAI offers a standardized pipeline for collecting and utilizing expert demonstrations. It supports deployment of multiple RL and IL approaches, and the training of both singular and compositional subtasks in suturing scenarios, featuring high dexterity and modularization. Meanwhile, SurgicAI sets clear metrics and benchmarks for the assessment of learned policies. We implemented and evaluated multiple RL and IL algorithms on SurgicAI. Our detailed benchmark analysis underscores SurgicAI's potential to advance policy learning in surgical robotics. Details: https://github.com/surgical-robotics-ai/SurgicAI

SurgicAI: A Hierarchical Platform for Fine-Grained Surgical Policy Learning and Benchmarking

TL;DR

Abstract

Paper Structure (27 sections, 5 equations, 5 figures, 4 tables, 1 algorithm)

This paper contains 27 sections, 5 equations, 5 figures, 4 tables, 1 algorithm.

Introduction
Related Work
Simulation Platforms for Data Collection and Training
Learning Based Approaches in Surgical Automation
Our Framework
Simulation Environment
Environment Settings
Task Overview
Hierarchical Architecture for Multi-stage Tasks
Capability of Our Framework
Platform Usage Guideline
Teleoperation and Dataset Collection
Sustained Maintenance Repository
Implementation and Experiment Results
Conclusions, Limitations, and Future Plans
...and 12 more sections

Figures (5)

Figure 1: Hierarchical Framework in SurgicAI. A High-Level Policy (HLP) selects and coordinates Low-Level Policies (LLPs) for specific tasks like grasping, placing, inserting, handoff, and pullout. Each LLP manages actions within its designated subtask until a termination condition is met, after which control returns to the HLP for the next decision.
Figure 2: Detailed Workflow of our Simulation Environment. The AMBF Description Files (ADF) define various simulation objects, such as rigid bodies, joints, and cameras. The launch file specifies simulation parameters and model-specific details. The AMBF Simulator processes these parameters to initialize the environment, while the AMBF Client bridges the simulator and the control scripts via ROS topics. Method APIs provide essential functions for controlling the simulation, including kinematics and servo control, while teleoperation scripts enable connections with multiple devices. The Gymnasium API integrates RL and IL capabilities, optimizing policies through interaction with the control scripts.
Figure 3: Training pipeline for reinforcement learning.
Figure 4: Workflow of the TD3+HER+BC for policy learning. The rollout collected from the SRC environment is stored in the form $\left(s_t, a_t, s_{t+1}, r, a g, d g\right)$. Both HER module and expert demonstrations enrich the buffer with additional transitions. The TD3 network, comprising actors and critics, updates its policy based on the rollout from the expanded buffer. The behavior cloning loss $\left(L_{B C}\right)$ minimizes the error between predicted and expert actions, while using a $\mathrm{Q}$ filter to ensure the policy remains close to the expert while avoiding suboptimal solutions.
Figure 5: Expert demonstration data collection pipeline.

SurgicAI: A Hierarchical Platform for Fine-Grained Surgical Policy Learning and Benchmarking

TL;DR

Abstract

SurgicAI: A Hierarchical Platform for Fine-Grained Surgical Policy Learning and Benchmarking

Authors

TL;DR

Abstract

Table of Contents

Figures (5)