Dr. Tongue: Sign-Oriented Multi-label Detection for Remote Tongue Diagnosis

Yiliang Chen; Steven SC Ho; Cheng Xu; Yao Jie Xie; Wing-Fai Yeung; Shengfeng He; Jing Qin

Dr. Tongue: Sign-Oriented Multi-label Detection for Remote Tongue Diagnosis

Yiliang Chen, Steven SC Ho, Cheng Xu, Yao Jie Xie, Wing-Fai Yeung, Shengfeng He, Jing Qin

TL;DR

This work tackles remote tongue diagnosis in telemedicine by introducing the TongueDx dataset and a Sign-Oriented multi-label framework (SignNet) that fuses whole-tongue, body, and edge information through region-aware attention. The system starts with Adaptive Tongue Feature Extraction (ATFE) to detect, segment, and upright-align tongue images, then applies SignNet to predict eight tongue surface attributes with inter-sign relationships encoded in a graph. Quantitative results show ResNet50+ATFE and SignNet outperform baseline models, with F1-scores and overall performance approaching practitioner levels, though color- and lighting-sensitive attributes remain challenging due to data imbalance. The publicly released TongueDx dataset and the proposed framework hold promise for robust, scalable remote tongue diagnostics in telemedicine, and the authors outline concrete avenues (bounding boxes, video data, color correction) to further improve reliability and clinical utility.

Abstract

Tongue diagnosis is a vital tool in Western and Traditional Chinese Medicine, providing key insights into a patient's health by analyzing tongue attributes. The COVID-19 pandemic has heightened the need for accurate remote medical assessments, emphasizing the importance of precise tongue attribute recognition via telehealth. To address this, we propose a Sign-Oriented multi-label Attributes Detection framework. Our approach begins with an adaptive tongue feature extraction module that standardizes tongue images and mitigates environmental factors. This is followed by a Sign-oriented Network (SignNet) that identifies specific tongue attributes, emulating the diagnostic process of experienced practitioners and enabling comprehensive health evaluations. To validate our methodology, we developed an extensive tongue image dataset specifically designed for telemedicine. Unlike existing datasets, ours is tailored for remote diagnosis, with a comprehensive set of attribute labels. This dataset will be openly available, providing a valuable resource for research. Initial tests have shown improved accuracy in detecting various tongue attributes, highlighting our framework's potential as an essential tool for remote medical assessments.

Dr. Tongue: Sign-Oriented Multi-label Detection for Remote Tongue Diagnosis

TL;DR

Abstract

Paper Structure (32 sections, 8 equations, 7 figures, 5 tables, 2 algorithms)

This paper contains 32 sections, 8 equations, 7 figures, 5 tables, 2 algorithms.

Introduction
Related Work
Zero-shot Learning in Vision-Language Models.
Tongue Images Dataset and Studies.
The TongueDx Dataset
Methodology
Formulation
Adaptive Tongue Feature Extraction Module
Tongue Detection and Segmentation.
Upright Orientation of Tongue.
Attribute-Sign Relationships Graph
Sign-Oriented Attributes Detection Network
Loss Function
Experiments
Compared Methods.
...and 17 more sections

Figures (7)

Figure 1: Challenges in tongue diagnosis imaging for telehealth: (a) low resolution and (b) complex lighting conditions.
Figure 2: Cropped images with eight representative tongue attributes, with characteristics encircled in yellow.
Figure 3: Original samples of tongue images from our dataset.
Figure 4: Two examples of the Tongue Image Upright Orientation Algorithm: Input image (Before Orientation) and the upright oriented tongue image (After Orientation).
Figure 5: Overview of our Sign-Oriented Attributes Detection Framework. The framework consists of three main stages: (1) Adaptive Tongue Feature Extraction (ATFE) module, which employs a detection network to locate the tongue, followed by Mobile-SAM (Segment Anything Model) for precise tongue segmentation. The segmented tongue image then undergoes upright orientation for standardization. (2) Tongue Edge and Body Region Separation (TEBRS), where the normalized tongue image is further separated into the tongue body and edge regions. (3) SignNet Pipeline, where the whole tongue, tongue body, and tongue edge images are fed into separate network branches to predict eight distinct tongue attributes.
...and 2 more figures

Dr. Tongue: Sign-Oriented Multi-label Detection for Remote Tongue Diagnosis

TL;DR

Abstract

Dr. Tongue: Sign-Oriented Multi-label Detection for Remote Tongue Diagnosis

Authors

TL;DR

Abstract

Table of Contents

Figures (7)