Table of Contents
Fetching ...

SutureAgent: Learning Surgical Trajectories via Goal-conditioned Offline RL in Pixel Space

Huanrong Liu, Chunlin Tian, Tongyu Jia, Tailai Zhou, Qin Liu, Yu Gao, Yutong Ban, Yun Gu, Guy Rosman, Xin Ma, Qingbiao Li

Abstract

Predicting surgical needle trajectories from endoscopic video is critical for robot-assisted suturing, enabling anticipatory planning, real-time guidance, and safer motion execution. Existing methods that directly learn motion distributions from visual observations tend to overlook the sequential dependency among adjacent motion steps. Moreover, sparse waypoint annotations often fail to provide sufficient supervision, further increasing the difficulty of supervised or imitation learning methods. To address these challenges, we formulate image-based needle trajectory prediction as a sequential decision-making problem, in which the needle tip is treated as an agent that moves step by step in pixel space. This formulation naturally captures the continuity of needle motion and enables the explicit modeling of physically plausible pixel-wise state transitions over time. From this perspective, we propose SutureAgent, a goal-conditioned offline reinforcement learning framework that leverages sparse annotations to dense reward signals via cubic spline interpolation, encouraging the policy to exploit limited expert guidance while exploring plausible future motion paths. SutureAgent encodes variable-length clips using an observation encoder to capture both local spatial cues and long-range temporal dynamics, and autoregressively predicts future waypoints through actions composed of discrete directions and continuous magnitudes. To enable stable offline policy optimization from expert demonstrations, we adopt Conservative Q-Learning with Behavioral Cloning regularization. Experiments on a new kidney wound suturing dataset containing 1,158 trajectories from 50 patients show that SutureAgent reduces Average Displacement Error by 58.6% compared with the strongest baseline, demonstrating the effectiveness of modeling needle trajectory prediction as pixel-level sequential action learning.

SutureAgent: Learning Surgical Trajectories via Goal-conditioned Offline RL in Pixel Space

Abstract

Predicting surgical needle trajectories from endoscopic video is critical for robot-assisted suturing, enabling anticipatory planning, real-time guidance, and safer motion execution. Existing methods that directly learn motion distributions from visual observations tend to overlook the sequential dependency among adjacent motion steps. Moreover, sparse waypoint annotations often fail to provide sufficient supervision, further increasing the difficulty of supervised or imitation learning methods. To address these challenges, we formulate image-based needle trajectory prediction as a sequential decision-making problem, in which the needle tip is treated as an agent that moves step by step in pixel space. This formulation naturally captures the continuity of needle motion and enables the explicit modeling of physically plausible pixel-wise state transitions over time. From this perspective, we propose SutureAgent, a goal-conditioned offline reinforcement learning framework that leverages sparse annotations to dense reward signals via cubic spline interpolation, encouraging the policy to exploit limited expert guidance while exploring plausible future motion paths. SutureAgent encodes variable-length clips using an observation encoder to capture both local spatial cues and long-range temporal dynamics, and autoregressively predicts future waypoints through actions composed of discrete directions and continuous magnitudes. To enable stable offline policy optimization from expert demonstrations, we adopt Conservative Q-Learning with Behavioral Cloning regularization. Experiments on a new kidney wound suturing dataset containing 1,158 trajectories from 50 patients show that SutureAgent reduces Average Displacement Error by 58.6% compared with the strongest baseline, demonstrating the effectiveness of modeling needle trajectory prediction as pixel-level sequential action learning.

Paper Structure

This paper contains 12 sections, 17 equations, 4 figures, 1 table.

Figures (4)

  • Figure 1: Overview of the proposed framework. (i) Given the observed video segment, the observation encoder extracts local visual guidance features from needle-centered crops and aggregates their temporal dependencies with a Transformer to obtain the contextual representation $z_c$. (ii) At each prediction step $k$, the goal-conditioned state encoder constructs the state $s_k$ by combining $z_c$ with the encoded current position $\hat{p}_k$, guidance coordinate $g_k$, relative displacement $g_k-\hat{p}_k$, and normalized step ratio $k/T_{pred}$. (iii) The policy head then predicts a discrete motion direction and a continuous step magnitude to autoregressively update the needle-tip position and generate the future trajectory. (iv) Training is performed entirely offline on expert transitions using Conservative Q-Learning with twin critics, together with auxiliary behavior-cloning and magnitude-regression objectives.
  • Figure 2: Qualitative comparison of predicted trajectories on the testset. The yellow curve denotes the observed trajectory, the green curve represents the ground truth future trajectory, the red curve shows the prediction from our SutureAgent and the blue curve indicates the best baseline prediction.
  • Figure 3: Distribution of Average Displacement Error (ADE) across all methods on the test set. (a) Violin plot showing the ADE distribution for each method, with individual data points overlaid. Black diamonds indicate the mean and white horizontal lines indicate the median. (b) Empirical cumulative distribution function (CDF) of ADE. The dashed vertical line marks the ADE = 100 pixel threshold, where our method achieves 90% of trajectories below this value.
  • Figure 4: Per-trajectory Q-value curves on four test trajectories of increasing prediction horizon, demonstrating generalisation across variable-length sequences. $Q_{\mathrm{policy}}(s_k, a_k^\pi)$ (solid blue) is the pessimistic value estimate $\min(Q_1, Q_2)$ of the policy's chosen action at step $k$; $Q_{\mathrm{expert}}(s_k, a_k^*)$ (dashed red) is the value of the corresponding ground-truth expert action. Orange vertical lines indicate keyframe positions. $Q_{\mathrm{policy}} \geq Q_{\mathrm{expert}}$ consistently across all horizons confirms that the CQL conservative penalty successfully shapes the Q-function to rank the learned policy above suboptimal offline actions, regardless of trajectory length.