CataractCompDetect: Intraoperative Complication Detection in Cataract Surgery
Bhuvan Sachdeva, Sneha Kumari, Rudransh Agarwal, Shalaka Kumaraswamy, Niharika Singri Prasad, Simon Mueller, Raphael Lechtenboehmer, Maximilian W. M. Wintergerst, Thomas Schultz, Kaushik Murali, Mohit Jain
TL;DR
This work tackles the challenging problem of intraoperative complication detection in cataract surgery by introducing CataractCompDetect, a pipeline that fuses phase-aware localization, SAM 2-based tracking, complication-specific risk scoring, and vision-language reasoning. A new real-world dataset, CataComp, with 53 MSICS videos annotated for iris prolapse, posterior capsule rupture, and vitreous loss, enables robust benchmarking. On CataComp, the method achieves an average F1 of 70.63% across complications, with per-type F1 scores of 81.80% (iris prolapse), 60.87% (PCR), and 69.23% (vitreous loss), demonstrating the value of combining structured surgical priors with open-ended visual reasoning. The work highlights practical potential for real-time monitoring and training feedback, and provides the dataset and code to catalyze further research in surgical complication detection.
Abstract
Cataract surgery is one of the most commonly performed surgeries worldwide, yet intraoperative complications such as iris prolapse, posterior capsule rupture (PCR), and vitreous loss remain major causes of adverse outcomes. Automated detection of such events could enable early warning systems and objective training feedback. In this work, we propose CataractCompDetect, a complication detection framework that combines phase-aware localization, SAM 2-based tracking, complication-specific risk scoring, and vision-language reasoning for final classification. To validate CataractCompDetect, we curate CataComp, the first cataract surgery video dataset annotated for intraoperative complications, comprising 53 surgeries, including 23 with clinical complications. On CataComp, CataractCompDetect achieves an average F1 score of 70.63%, with per-complication performance of 81.8% (Iris Prolapse), 60.87% (PCR), and 69.23% (Vitreous Loss). These results highlight the value of combining structured surgical priors with vision-language reasoning for recognizing rare but high-impact intraoperative events. Our dataset and code will be publicly released upon acceptance.
