Table of Contents
Fetching ...

arg-VU: Affordance Reasoning with Physics-Aware 3D Geometry for Visual Understanding in Robotic Surgery

Nan Xiao, Yunxin Fan, Farong Wang, Fei Liu

Abstract

Affordance reasoning provides a principled link between perception and action, yet remains underexplored in surgical robotics, where tissues are highly deformable, compliant, and dynamically coupled with tool motion. We present arg-VU, a physics-aware affordance reasoning framework that integrates temporally consistent geometry tracking with constraint-induced mechanical modeling for surgical visual understanding. Surgical scenes are reconstructed using 3D Gaussian Splatting (3DGS) and converted into a temporally tracked surface representation. Extended Position-Based Dynamics (XPBD) embeds local deformation constraints and produces representative geometry points (RGPs) whose constraint sensitivities define anisotropic stiffness metrics capturing the local constraint-manifold geometry. Robotic tool poses in SE(3) are incorporated to compute rigidly induced displacements at RGPs, from which we derive two complementary measures: a physics-aware compliance energy that evaluates mechanical feasibility with respect to local deformation constraints, and a positional agreement score that captures motion alignment (as kinematic motion baseline). Experiments on surgical video datasets show that arg-VU yields more stable, physically consistent, and interpretable affordance predictions than kinematic baselines. These results demonstrate that physics-aware geometric representations enable reliable affordance reasoning for deformable surgical environments and support embodied robotic interaction.

arg-VU: Affordance Reasoning with Physics-Aware 3D Geometry for Visual Understanding in Robotic Surgery

Abstract

Affordance reasoning provides a principled link between perception and action, yet remains underexplored in surgical robotics, where tissues are highly deformable, compliant, and dynamically coupled with tool motion. We present arg-VU, a physics-aware affordance reasoning framework that integrates temporally consistent geometry tracking with constraint-induced mechanical modeling for surgical visual understanding. Surgical scenes are reconstructed using 3D Gaussian Splatting (3DGS) and converted into a temporally tracked surface representation. Extended Position-Based Dynamics (XPBD) embeds local deformation constraints and produces representative geometry points (RGPs) whose constraint sensitivities define anisotropic stiffness metrics capturing the local constraint-manifold geometry. Robotic tool poses in SE(3) are incorporated to compute rigidly induced displacements at RGPs, from which we derive two complementary measures: a physics-aware compliance energy that evaluates mechanical feasibility with respect to local deformation constraints, and a positional agreement score that captures motion alignment (as kinematic motion baseline). Experiments on surgical video datasets show that arg-VU yields more stable, physically consistent, and interpretable affordance predictions than kinematic baselines. These results demonstrate that physics-aware geometric representations enable reliable affordance reasoning for deformable surgical environments and support embodied robotic interaction.

Paper Structure

This paper contains 25 sections, 20 equations, 7 figures.

Figures (7)

  • Figure 1: Left: tracked surface geometry reconstructed from endoscopic video. Right: affordance score map (red indicates higher actionability). Our framework estimates physically plausible and actionable tool-tissue interaction regions rather than only reconstructing geometry.
  • Figure 2: Overall framework of $\arg$-Vu. Th 3D point cloud from 3DGS provides a temporally tracked surface representation. XPBD enforces local constraints to produce representative geometry point (RGPs) with anisotropic stiffness structure. Tool poses in $\mathrm{SE}(3)$ yield induced displacements used for PACS (physical feasibility) and PAS (motion alignment, as kinematic motion baseline) for affordance reasoning.
  • Figure 3: Neighborhood definitions. Left: radius neighborhoods may cross anatomical boundaries. Right: segmentation-aware filtering constrains neighborhoods within anatomical regions, improving physical plausibility near organ boundaries.
  • Figure 4: Geometric interpretation of the constraint-induced stiffness metric. Local deformation constraints define a manifold $\mathcal{M}$ whose tangent space represents admissible deformation directions. The stiffness metric $\mathbf{K}_i^{\mathrm{RGP}}$ induces an ellipsoid that visualizes anisotropic resistance to motion. Long axes correspond to compliant tangent directions, whereas short axes correspond to stiff constraint-normal directions. A tool-induced displacement $\Delta p_i$ at the representative point $p_i$ is evaluated through this metric.
  • Figure 5: Median cosine similarity between observed motion direction and the principal direction of predicted uncertainty (largest eigenvector). Our XPBD-derived covariance aligns better with future motion than the velocity-persistence baseline.
  • ...and 2 more figures