Table of Contents
Fetching ...

3D-LSPTM: An Automatic Framework with 3D-Large-Scale Pretrained Model for Laryngeal Cancer Detection Using Laryngoscopic Videos

Meiyu Qiu, Yun Li, Wenjun Huang, Haoyun Zhang, Weiping Zheng, Wenbin Lei, Xiaomao Fan

TL;DR

A novel automatic framework via 3D-large-scale pretrained models termed 3D-LSPTM for laryngeal cancer detection with fine-tuning techniques is proposed and can achieve promising performance on the task of laryngeal cancer detection.

Abstract

Laryngeal cancer is a malignant disease with a high morality rate in otorhinolaryngology, posing an significant threat to human health. Traditionally larygologists manually visual-inspect laryngeal cancer in laryngoscopic videos, which is quite time-consuming and subjective. In this study, we propose a novel automatic framework via 3D-large-scale pretrained models termed 3D-LSPTM for laryngeal cancer detection. Firstly, we collect 1,109 laryngoscopic videos from the First Affiliated Hospital Sun Yat-sen University with the approval of the Ethics Committee. Then we utilize the 3D-large-scale pretrained models of C3D, TimeSformer, and Video-Swin-Transformer, with the merit of advanced featuring videos, for laryngeal cancer detection with fine-tuning techniques. Extensive experiments show that our proposed 3D-LSPTM can achieve promising performance on the task of laryngeal cancer detection. Particularly, 3D-LSPTM with the backbone of Video-Swin-Transformer can achieve 92.4% accuracy, 95.6% sensitivity, 94.1% precision, and 94.8% F_1.

3D-LSPTM: An Automatic Framework with 3D-Large-Scale Pretrained Model for Laryngeal Cancer Detection Using Laryngoscopic Videos

TL;DR

A novel automatic framework via 3D-large-scale pretrained models termed 3D-LSPTM for laryngeal cancer detection with fine-tuning techniques is proposed and can achieve promising performance on the task of laryngeal cancer detection.

Abstract

Laryngeal cancer is a malignant disease with a high morality rate in otorhinolaryngology, posing an significant threat to human health. Traditionally larygologists manually visual-inspect laryngeal cancer in laryngoscopic videos, which is quite time-consuming and subjective. In this study, we propose a novel automatic framework via 3D-large-scale pretrained models termed 3D-LSPTM for laryngeal cancer detection. Firstly, we collect 1,109 laryngoscopic videos from the First Affiliated Hospital Sun Yat-sen University with the approval of the Ethics Committee. Then we utilize the 3D-large-scale pretrained models of C3D, TimeSformer, and Video-Swin-Transformer, with the merit of advanced featuring videos, for laryngeal cancer detection with fine-tuning techniques. Extensive experiments show that our proposed 3D-LSPTM can achieve promising performance on the task of laryngeal cancer detection. Particularly, 3D-LSPTM with the backbone of Video-Swin-Transformer can achieve 92.4% accuracy, 95.6% sensitivity, 94.1% precision, and 94.8% F_1.
Paper Structure (12 sections, 2 figures, 1 table)

This paper contains 12 sections, 2 figures, 1 table.

Figures (2)

  • Figure 1: The pipeline of 3D-LSPTM.
  • Figure 2: Confusion matrix of 3D-LSPTM with three backbones: (a) C3D, (b) TimeSformer, and (3)Video-Swin-Transformer.