PhySe-RPO: Physics and Semantics Guided Relative Policy Optimization for Diffusion-Based Surgical Smoke Removal

Zining Fang; Chunhui Liu; Bin Xu; Ming Chen; Xiaowei Hu; Cheng Xue

PhySe-RPO: Physics and Semantics Guided Relative Policy Optimization for Diffusion-Based Surgical Smoke Removal

Zining Fang, Chunhui Liu, Bin Xu, Ming Chen, Xiaowei Hu, Cheng Xue

Abstract

Surgical smoke severely degrades intraoperative video quality, obscuring anatomical structures and limiting surgical perception. Existing learning-based desmoking approaches rely on scarce paired supervision and deterministic restoration pipelines, making it difficult to perform exploration or reinforcement-driven refinement under real surgical conditions. We propose PhySe-RPO, a diffusion restoration framework optimized through Physics- and Semantics-Guided Relative Policy Optimization. The core idea is to transform deterministic restoration into a stochastic policy, enabling trajectory-level exploration and critic-free updates via group-relative optimization. A physics-guided reward imposes illumination and color consistency, while a visual-concept semantic reward learned from CLIP-based surgical concepts promotes smoke-free and anatomically coherent restoration. Together with a reference-free perceptual constraint, PhySe-RPO produces results that are physically consistent, semantically faithful, and clinically interpretable across synthetic and real robotic surgical datasets, providing a principled route to robust diffusion-based restoration under limited paired supervision.

PhySe-RPO: Physics and Semantics Guided Relative Policy Optimization for Diffusion-Based Surgical Smoke Removal

Abstract

Paper Structure (14 sections, 17 equations, 5 figures, 5 tables)

This paper contains 14 sections, 17 equations, 5 figures, 5 tables.

Introduction
Related Work
Methods
Group-relative Diffusion Policy Optimization
Physics-Guided Reward via Color Priors
Visual-Concept Semantic Reward
Reference-Free Quality Constraint
Overall
Experiments
Experimental Settings
Experimental Results
Ablation Study
Downstream Validation.
Conclusion

Figures (5)

Figure 1: Limitations of existing restoration approaches: lack of paired data, restoration produce a deterministic output making reward learning difficult, and lack of desmoke restoration-oriented rewards. Our PhySe-RPO addresses these issues by turning restoration into a stochastic policy optimization problem and using physics- and semantics-guided rewards to learn effectively from unlabeled real surgical videos.
Figure 2: Overview of the PhySe-RPO framework. PhySe-RPO refines the pretrained diffusion model through Group-relative Diffusion Policy Optimization, where multiple stochastic trajectories are sampled and optimized using physics-guided color priors, perceptual quality metrics, and semantic rewards, achieving physically consistent and clinically interpretable surgical smoke removal.
Figure 3: Visual-Concept Integration into Diffusion. (a) Learnable visual concepts are trained via contrastive learning to differentiate "clear’’ and "smoky’’ concepts in the semantic space. (b) The learned tokens are integrated into the diffusion backbone through multimodal fusion and temporal adaptation to guide semantically consistent desmoking.
Figure 4: Qualitative comparison on real-world surgical smoke images. Compared with prior desmoking methods, PhySe-RPO produces clearer structures, more natural color restoration, and fewer residual smoke artifacts.
Figure 5: Reward convergence analysis. Average reward (a) and reward variance (b) under different semantic reward settings. The Full model converges faster with lower variance than Text-Reward and w/o $R_{\text{VC}}$.

PhySe-RPO: Physics and Semantics Guided Relative Policy Optimization for Diffusion-Based Surgical Smoke Removal

Abstract

PhySe-RPO: Physics and Semantics Guided Relative Policy Optimization for Diffusion-Based Surgical Smoke Removal

Authors

Abstract

Table of Contents

Figures (5)