Texture Classification Network Integrating Adaptive Wavelet Transform
Su-Xi Yu, Jing-Yuan He, Yi Wang, Yu-Jiao Cai, Jun Yang, Bo Lin, Wei-Bin Yang, Jian Ruan
TL;DR
The paper addresses texture-based Graves' disease diagnosis from thyroid ultrasound, where CNNs struggle to robustly capture texture under variable imaging conditions. It introduces a parallel adaptive wavelet transform module built on Haar 2D lifting and a multi-resolution wavelet branch that complements a ResNet18 backbone, enabling simultaneous spatial and frequency-domain feature learning. Key contributions include a trainable wavelet transform loss $\text{Loss}_{WT}=\alpha\sum_{i=1}^L H(D_i)+\beta\sum_{i=1}^L \lVert m^{I}_{i}-m^{A}_{i}\rVert^{2}_{2}$, an efficient DWT-Split design to reduce parameters, and a parallel architecture that improves accuracy/recall on ultrasound texture classification (e.g., $97.90\%$ accuracy and $95.86\%$ recall) while achieving competitive results on natural textures (e.g., $60.765\%$ accuracy on KTH-TIPS-B). The approach demonstrates enhanced texture discrimination by combining spatial and frequency-domain cues with multi-resolution analysis, offering practical improvements for medical ultrasound diagnosis and potential applicability to other texture tasks. The use of Haar-based splitting reduces model complexity, and the insertion-position/multiresolution strategy enables flexible integration with existing CNN backbones. $N$ (or $L$) levels denote the maximum wavelet decomposition depth, governing the degree of frequency-domain detail captured.
Abstract
Graves' disease is a common condition that is diagnosed clinically by determining the smoothness of the thyroid texture and its morphology in ultrasound images. Currently, the most widely used approach for the automated diagnosis of Graves' disease utilizes Convolutional Neural Networks (CNNs) for both feature extraction and classification. However, these methods demonstrate limited efficacy in capturing texture features. Given the high capacity of wavelets in describing texture features, this research integrates learnable wavelet modules utilizing the Lifting Scheme into CNNs and incorporates a parallel wavelet branch into the ResNet18 model to enhance texture feature extraction. Our model can analyze texture features in spatial and frequency domains simultaneously, leading to optimized classification accuracy. We conducted experiments on collected ultrasound datasets and publicly available natural image texture datasets, our proposed network achieved 97.27% accuracy and 95.60% recall on ultrasound datasets, 60.765% accuracy on natural image texture datasets, surpassing the accuracy of ResNet and conrming the effectiveness of our approach.
