Bilinear-Convolutional Neural Network Using a Matrix Similarity-based Joint Loss Function for Skin Disease Classification
Belal Ahmad, Mohd Usama, Tanvir Ahmad, Adnan Saeed, Shabnam Khatoon, Long Hu
TL;DR
This work tackles fine-grained skin-disease classification by marrying a Bilinear Convolutional Neural Network (BCNN) with a Constrained Triplet Network (CTN). The BCNN captures rich spatial interactions through bilinear pooling, while the CTN imposes an online, matrix-similarity-driven joint loss to tighten intra-class and widen inter-class separation, formalized as $L_{joint} = \alpha_t L_{softmax} + (1 - \alpha_t) L_{Triplet}$. The approach uses an Xception backbone and a similarity-matrix-driven sampling strategy to optimize both embedding and classification, achieving a mean accuracy of $93.72\%$ on ISIC2019 and demonstrating improved discriminative feature learning over prior methods. The work provides a practical, end-to-end framework with interpretable results via Grad-CAM, offering potential benefits for automated dermatological screening and early skin cancer detection. Future directions include multi-branch loss integration and incorporating attention mechanisms to further refine fine-grained skin lesion discrimination.
Abstract
In this study, we proposed a model for skin disease classification using a Bilinear Convolutional Neural Network (BCNN) with a Constrained Triplet Network (CTN). BCNN can capture rich spatial interactions between features in image data. This computes the outer product of feature vectors from two different CNNs by a bilinear pooling. The resulting features encode second-order statistics, enabling the network to capture more complex relationships between different channels and spatial locations. The CTN employs the Triplet Loss Function (TLF) by using a new loss layer that is added at the end of the architecture called the Constrained Triplet Loss (CTL) layer. This is done to obtain two significant learning objectives: inter-class categorization and intra-class concentration with their deep features as often as possible, which can be effective for skin disease classification. The proposed model is trained to extract the intra-class features from a deep network and accordingly increases the distance between these features, improving the model's performance. The model achieved a mean accuracy of 93.72%.
