Self-Supervised Video Desmoking for Laparoscopic Surgery
Renlong Wu, Zhilu Zhang, Shuohao Zhang, Longfei Gou, Haobin Chen, Lei Zhang, Hao Chen, Wangmeng Zuo
TL;DR
This work tackles the challenge of removing surgical smoke from real laparoscopic videos without paired clean data by introducing SelfSVD, a self-supervised video desmoking framework that exploits pre-smoke frames $S_{ps}$ as supervision and as a reference input. A deformation-based loss with optical-flow alignment, a masking strategy, and a regularization term prevent trivial solutions, enabling stable learning on real-world smoky videos. The authors collect the LSVD dataset of real laparoscopic videos and demonstrate that SelfSVD and its lightweight variant outperform state-of-the-art methods in smoke removal and detail recovery, with practical real-time deployment potential. The approach advances practical desmoking by leveraging video structure and real pre-smoke frames, reducing domain gaps and enabling earlier, clearer visualization for surgeons.
Abstract
Due to the difficulty of collecting real paired data, most existing desmoking methods train the models by synthesizing smoke, generalizing poorly to real surgical scenarios. Although a few works have explored single-image real-world desmoking in unpaired learning manners, they still encounter challenges in handling dense smoke. In this work, we address these issues together by introducing the self-supervised surgery video desmoking (SelfSVD). On the one hand, we observe that the frame captured before the activation of high-energy devices is generally clear (named pre-smoke frame, PS frame), thus it can serve as supervision for other smoky frames, making real-world self-supervised video desmoking practically feasible. On the other hand, in order to enhance the desmoking performance, we further feed the valuable information from PS frame into models, where a masking strategy and a regularization term are presented to avoid trivial solutions. In addition, we construct a real surgery video dataset for desmoking, which covers a variety of smoky scenes. Extensive experiments on the dataset show that our SelfSVD can remove smoke more effectively and efficiently while recovering more photo-realistic details than the state-of-the-art methods. The dataset, codes, and pre-trained models are available at \url{https://github.com/ZcsrenlongZ/SelfSVD}.
