Table of Contents
Fetching ...

JSCDS: A Core Data Selection Method with Jason-Shannon Divergence for Caries RGB Images-Efficient Learning

Peiliang Zhang, Yujia Tong, Chenghu Du, Chao Che, Yongjun Zhu

TL;DR

This paper tackles RGB caries detection with limited training data quality by introducing JSCDS, a core data selection method that uses Jensen–Shannon Divergence to quantify mutual information between sample embeddings and class-centered clusters. By computing cluster centers in embedding space and selecting samples whose mutual information is close to the average, JSCDS constructs a high-quality core data subset that preserves or enhances predictive performance while reducing training time. Across MobileNetV2 and ResNet18, JSCDS outperforms competing core-set methods and, notably, achieves full-dataset performance with only 50% of the core data and further gains at 70%. The approach offers a practical path to efficient, high-accuracy caries RGB image learning in real-world clinical datasets.

Abstract

Deep learning-based RGB caries detection improves the efficiency of caries identification and is crucial for preventing oral diseases. The performance of deep learning models depends on high-quality data and requires substantial training resources, making efficient deployment challenging. Core data selection, by eliminating low-quality and confusing data, aims to enhance training efficiency without significantly compromising model performance. However, distance-based data selection methods struggle to distinguish dependencies among high-dimensional caries data. To address this issue, we propose a Core Data Selection Method with Jensen-Shannon Divergence (JSCDS) for efficient caries image learning and caries classification. We describe the core data selection criterion as the distribution of samples in different classes. JSCDS calculates the cluster centers by sample embedding representation in the caries classification network and utilizes Jensen-Shannon Divergence to compute the mutual information between data samples and cluster centers, capturing nonlinear dependencies among high-dimensional data. The average mutual information is calculated to fit the above distribution, serving as the criterion for constructing the core set for model training. Extensive experiments on RGB caries datasets show that JSCDS outperforms other data selection methods in prediction performance and time consumption. Notably, JSCDS exceeds the performance of the full dataset model with only 50% of the core data, with its performance advantage becoming more pronounced in the 70% of core data.

JSCDS: A Core Data Selection Method with Jason-Shannon Divergence for Caries RGB Images-Efficient Learning

TL;DR

This paper tackles RGB caries detection with limited training data quality by introducing JSCDS, a core data selection method that uses Jensen–Shannon Divergence to quantify mutual information between sample embeddings and class-centered clusters. By computing cluster centers in embedding space and selecting samples whose mutual information is close to the average, JSCDS constructs a high-quality core data subset that preserves or enhances predictive performance while reducing training time. Across MobileNetV2 and ResNet18, JSCDS outperforms competing core-set methods and, notably, achieves full-dataset performance with only 50% of the core data and further gains at 70%. The approach offers a practical path to efficient, high-accuracy caries RGB image learning in real-world clinical datasets.

Abstract

Deep learning-based RGB caries detection improves the efficiency of caries identification and is crucial for preventing oral diseases. The performance of deep learning models depends on high-quality data and requires substantial training resources, making efficient deployment challenging. Core data selection, by eliminating low-quality and confusing data, aims to enhance training efficiency without significantly compromising model performance. However, distance-based data selection methods struggle to distinguish dependencies among high-dimensional caries data. To address this issue, we propose a Core Data Selection Method with Jensen-Shannon Divergence (JSCDS) for efficient caries image learning and caries classification. We describe the core data selection criterion as the distribution of samples in different classes. JSCDS calculates the cluster centers by sample embedding representation in the caries classification network and utilizes Jensen-Shannon Divergence to compute the mutual information between data samples and cluster centers, capturing nonlinear dependencies among high-dimensional data. The average mutual information is calculated to fit the above distribution, serving as the criterion for constructing the core set for model training. Extensive experiments on RGB caries datasets show that JSCDS outperforms other data selection methods in prediction performance and time consumption. Notably, JSCDS exceeds the performance of the full dataset model with only 50% of the core data, with its performance advantage becoming more pronounced in the 70% of core data.
Paper Structure (17 sections, 5 equations, 3 figures, 1 table)

This paper contains 17 sections, 5 equations, 3 figures, 1 table.

Figures (3)

  • Figure 1: The motivation statement for JSCDS.
  • Figure 2: The workflow of JSCDS. JSCDS calculates cluster centers based on neural network embeddings representation, and then combines the AvgMI of data samples to generate the core set for model training.
  • Figure 3: The prediction results for different fractions. The red line indicates the results with the full dataset.