Table of Contents
Fetching ...

ArthroCut: Autonomous Policy Learning for Robotic Bone Resection in Knee Arthroplasty

Xu Lu, Yiling Zhang, Wenquan Cheng, Longfei Ma, Fang Chen, Hongen Liao

TL;DR

Results indicate that aligning preoperative geometry with time-aligned intraoperative perception and translating that alignment into tokenized, constrained actions is an effective path toward robust, interpretable autonomy in orthopedic robotic surgery.

Abstract

Despite rapid commercialization of surgical robots, their autonomy and real-time decision-making remain limited in practice. To address this gap, we propose ArthroCut, an autonomous policy learning framework that upgrades knee arthroplasty robots from assistive execution to context-aware action generation. ArthroCut fine-tunes a Qwen--VL backbone on a self-built, time-synchronized multimodal dataset from 21 complete cases (23,205 RGB--D pairs), integrating preoperative CT/MR, intraoperative NDI tracking of bones and end effector, RGB--D surgical video, robot state, and textual intent. The method operates on two complementary token families -- Preoperative Imaging Tokens (PIT) to encode patient-specific anatomy and planned resection planes, and Time-Aligned Surgical Tokens (TAST) to fuse real-time visual, geometric, and kinematic evidence -- and emits an interpretable action grammar under grammar/safety-constrained decoding. In bench-top experiments on a knee prosthesis across seven trials, ArthroCut achieves an average success rate of 86% over the six standard resections, significantly outperforming strong baselines trained under the same protocol. Ablations show that TAST is the principal driver of reliability while PIT provides essential anatomical grounding, and their combination yields the most stable multi-plane execution. These results indicate that aligning preoperative geometry with time-aligned intraoperative perception and translating that alignment into tokenized, constrained actions is an effective path toward robust, interpretable autonomy in orthopedic robotic surgery.

ArthroCut: Autonomous Policy Learning for Robotic Bone Resection in Knee Arthroplasty

TL;DR

Results indicate that aligning preoperative geometry with time-aligned intraoperative perception and translating that alignment into tokenized, constrained actions is an effective path toward robust, interpretable autonomy in orthopedic robotic surgery.

Abstract

Despite rapid commercialization of surgical robots, their autonomy and real-time decision-making remain limited in practice. To address this gap, we propose ArthroCut, an autonomous policy learning framework that upgrades knee arthroplasty robots from assistive execution to context-aware action generation. ArthroCut fine-tunes a Qwen--VL backbone on a self-built, time-synchronized multimodal dataset from 21 complete cases (23,205 RGB--D pairs), integrating preoperative CT/MR, intraoperative NDI tracking of bones and end effector, RGB--D surgical video, robot state, and textual intent. The method operates on two complementary token families -- Preoperative Imaging Tokens (PIT) to encode patient-specific anatomy and planned resection planes, and Time-Aligned Surgical Tokens (TAST) to fuse real-time visual, geometric, and kinematic evidence -- and emits an interpretable action grammar under grammar/safety-constrained decoding. In bench-top experiments on a knee prosthesis across seven trials, ArthroCut achieves an average success rate of 86% over the six standard resections, significantly outperforming strong baselines trained under the same protocol. Ablations show that TAST is the principal driver of reliability while PIT provides essential anatomical grounding, and their combination yields the most stable multi-plane execution. These results indicate that aligning preoperative geometry with time-aligned intraoperative perception and translating that alignment into tokenized, constrained actions is an effective path toward robust, interpretable autonomy in orthopedic robotic surgery.
Paper Structure (18 sections, 10 equations, 4 figures, 4 tables)

This paper contains 18 sections, 10 equations, 4 figures, 4 tables.

Figures (4)

  • Figure 1: Comparison between vanilla VLA, commercial orthopedic robots, and the proposed ArthroCut framework. Conventional VLA models (top) directly map task inputs to robotic actions but fail to incorporate detailed surgical scene understanding, limiting their robustness in complex environments. Commercial orthopedic robots (middle) use intraoperative data to follow pre-templated paths but lack real-time adaptation and personalization wang2022progress. In contrast, ArthroCut (bottom) integrates preoperative imaging with temporally aligned multimodal intraoperative data, enabling accurate, real-time decision-making that improves both surgical precision and personalization.
  • Figure 2: The overview of our proposed method for robotic bone resection in knee arthroplasty. Preoperative imaging and intraoperative multimodal streams are encoded into tokens and processed by a transformer-based model to generate surgical actions ($\texttt{<MOVE>}, \texttt{<ALIGN>}, \texttt{<CUT>}$) for autonomous femoral cutting.
  • Figure 3: Example of task execution for femoral cutting across five planes using ArthroCut. For each task: Left—text instruction with the initial state (top-right inset of the second column shows the pre-cut bone surface); Middle—generated intermediate goal states; Right—final state upon task completion. Full execution trajectories are provided in the supplementary video.
  • Figure 4: Schematic diagram of knee replacement verification after osteotomy. (a) Femoral resections provide a visually flat surface for component seating; (b) the joint shown in full extension after component placement.