Table of Contents
Fetching ...

Weakly Supervised Object Detection for Automatic Tooth-marked Tongue Recognition

Yongcun Zhang, Jiajun Xu, Yina He, Shaozi Li, Zhiming Luo, Huangwei Lei

TL;DR

This work proposes a novel fully automated Weakly Supervised method using Vision transformer and Multiple instance learning WSVM for tongue extraction and tooth-marked tongue recognition, and achieves high accuracy in tooth-marked tongue classification.

Abstract

Tongue diagnosis in Traditional Chinese Medicine (TCM) is a crucial diagnostic method that can reflect an individual's health status. Traditional methods for identifying tooth-marked tongues are subjective and inconsistent because they rely on practitioner experience. We propose a novel fully automated Weakly Supervised method using Vision transformer and Multiple instance learning WSVM for tongue extraction and tooth-marked tongue recognition. Our approach first accurately detects and extracts the tongue region from clinical images, removing any irrelevant background information. Then, we implement an end-to-end weakly supervised object detection method. We utilize Vision Transformer (ViT) to process tongue images in patches and employ multiple instance loss to identify tooth-marked regions with only image-level annotations. WSVM achieves high accuracy in tooth-marked tongue classification, and visualization experiments demonstrate its effectiveness in pinpointing these regions. This automated approach enhances the objectivity and accuracy of tooth-marked tongue diagnosis. It provides significant clinical value by assisting TCM practitioners in making precise diagnoses and treatment recommendations. Code is available at https://github.com/yc-zh/WSVM.

Weakly Supervised Object Detection for Automatic Tooth-marked Tongue Recognition

TL;DR

This work proposes a novel fully automated Weakly Supervised method using Vision transformer and Multiple instance learning WSVM for tongue extraction and tooth-marked tongue recognition, and achieves high accuracy in tooth-marked tongue classification.

Abstract

Tongue diagnosis in Traditional Chinese Medicine (TCM) is a crucial diagnostic method that can reflect an individual's health status. Traditional methods for identifying tooth-marked tongues are subjective and inconsistent because they rely on practitioner experience. We propose a novel fully automated Weakly Supervised method using Vision transformer and Multiple instance learning WSVM for tongue extraction and tooth-marked tongue recognition. Our approach first accurately detects and extracts the tongue region from clinical images, removing any irrelevant background information. Then, we implement an end-to-end weakly supervised object detection method. We utilize Vision Transformer (ViT) to process tongue images in patches and employ multiple instance loss to identify tooth-marked regions with only image-level annotations. WSVM achieves high accuracy in tooth-marked tongue classification, and visualization experiments demonstrate its effectiveness in pinpointing these regions. This automated approach enhances the objectivity and accuracy of tooth-marked tongue diagnosis. It provides significant clinical value by assisting TCM practitioners in making precise diagnoses and treatment recommendations. Code is available at https://github.com/yc-zh/WSVM.
Paper Structure (16 sections, 8 equations, 7 figures, 5 tables)

This paper contains 16 sections, 8 equations, 7 figures, 5 tables.

Figures (7)

  • Figure 1: \ref{['fig:a']} is a normal tongue, \ref{['fig:b']}\ref{['fig:c']}\ref{['fig:d']} are three different tooth-marked tongues, \ref{['fig:e']}\ref{['fig:f']}\ref{['fig:g']}\ref{['fig:h']} are regional proposal approaches for four different methods of identifying tooth marks and tongues
  • Figure 2: Overall Framework of Our Approach. Our approach includes the first stage of automatic tongue foreground extraction and the second stage of weakly supervised tooth-mark tongue detection.
  • Figure 3: Fully automatic tongue extraction. It used SAM's zero-shot segmentation capability. We start with bounding boxes from our trained YOLOv8n yolov8_ultralytics tongue detector as prompts for SAM to generate a tongue mask. This mask is then multiplied with the original image to remove the background, isolating the tongue. The segmented tongue region is then cropped for further analysis.
  • Figure 4: Weakly Supervised Tooth-Marked Recognition. We build our model depending on the ViT, incorporating a multiple instance calculation module and using weakly supervised loss along with image-level labels for supervision.
  • Figure 5: Perfomance of tongue detection using yolov8n
  • ...and 2 more figures