Table of Contents
Fetching ...

Variable-frame CNNLSTM for Breast Nodule Classification using Ultrasound Videos

Xiangxiang Cui, Zhongyu Li, Xiayue Fan, Peng Huang, Ying Wang, Meng Yang, Shi Chang, Jihua Zhu

TL;DR

This work tackles the challenge of classifying breast nodules in dynamic ultrasound videos when frame counts vary across patients. It introduces a variable-frame CNN–LSTM that first extracts spatial features per frame with a ResNet-based encoder (512-dimensional vectors), stores them, and then models temporal dynamics with an LSTM using a PackedSequence representation to handle variable-length sequences. By sorting feature vectors by frame count and compressing them for LSTM input, the method avoids wasteful zero-padding and achieves improved performance over equal-frame and key-frame baselines. Experimental results on clinically sourced datasets show higher accuracy, precision, and F1 scores, with ROC and PR analyses indicating strong discriminative capability, suggesting broad applicability to other medical imaging modalities.

Abstract

The intersection of medical imaging and artificial intelligence has become an important research direction in intelligent medical treatment, particularly in the analysis of medical images using deep learning for clinical diagnosis. Despite the advances, existing keyframe classification methods lack extraction of time series features, while ultrasonic video classification based on three-dimensional convolution requires uniform frame numbers across patients, resulting in poor feature extraction efficiency and model classification performance. This study proposes a novel video classification method based on CNN and LSTM, introducing NLP's long and short sentence processing scheme into video classification for the first time. The method reduces CNN-extracted image features to 1x512 dimension, followed by sorting and compressing feature vectors for LSTM training. Specifically, feature vectors are sorted by patient video frame numbers and populated with padding value 0 to form variable batches, with invalid padding values compressed before LSTM training to conserve computing resources. Experimental results demonstrate that our variable-frame CNNLSTM method outperforms other approaches across all metrics, showing improvements of 3-6% in F1 score and 1.5% in specificity compared to keyframe methods. The variable-frame CNNLSTM also achieves better accuracy and precision than equal-frame CNNLSTM. These findings validate the effectiveness of our approach in classifying variable-frame ultrasound videos and suggest potential applications in other medical imaging modalities.

Variable-frame CNNLSTM for Breast Nodule Classification using Ultrasound Videos

TL;DR

This work tackles the challenge of classifying breast nodules in dynamic ultrasound videos when frame counts vary across patients. It introduces a variable-frame CNN–LSTM that first extracts spatial features per frame with a ResNet-based encoder (512-dimensional vectors), stores them, and then models temporal dynamics with an LSTM using a PackedSequence representation to handle variable-length sequences. By sorting feature vectors by frame count and compressing them for LSTM input, the method avoids wasteful zero-padding and achieves improved performance over equal-frame and key-frame baselines. Experimental results on clinically sourced datasets show higher accuracy, precision, and F1 scores, with ROC and PR analyses indicating strong discriminative capability, suggesting broad applicability to other medical imaging modalities.

Abstract

The intersection of medical imaging and artificial intelligence has become an important research direction in intelligent medical treatment, particularly in the analysis of medical images using deep learning for clinical diagnosis. Despite the advances, existing keyframe classification methods lack extraction of time series features, while ultrasonic video classification based on three-dimensional convolution requires uniform frame numbers across patients, resulting in poor feature extraction efficiency and model classification performance. This study proposes a novel video classification method based on CNN and LSTM, introducing NLP's long and short sentence processing scheme into video classification for the first time. The method reduces CNN-extracted image features to 1x512 dimension, followed by sorting and compressing feature vectors for LSTM training. Specifically, feature vectors are sorted by patient video frame numbers and populated with padding value 0 to form variable batches, with invalid padding values compressed before LSTM training to conserve computing resources. Experimental results demonstrate that our variable-frame CNNLSTM method outperforms other approaches across all metrics, showing improvements of 3-6% in F1 score and 1.5% in specificity compared to keyframe methods. The variable-frame CNNLSTM also achieves better accuracy and precision than equal-frame CNNLSTM. These findings validate the effectiveness of our approach in classifying variable-frame ultrasound videos and suggest potential applications in other medical imaging modalities.

Paper Structure

This paper contains 20 sections, 7 equations, 11 figures, 4 tables.

Figures (11)

  • Figure 1: Overview
  • Figure 2: Dynamic frame change
  • Figure 3: Resnet framework
  • Figure 4: Video frame number and video number
  • Figure 5: Variable Lstm Framework
  • ...and 6 more figures