Bilinear-Convolutional Neural Network Using a Matrix Similarity-based Joint Loss Function for Skin Disease Classification

Belal Ahmad; Mohd Usama; Tanvir Ahmad; Adnan Saeed; Shabnam Khatoon; Long Hu

Bilinear-Convolutional Neural Network Using a Matrix Similarity-based Joint Loss Function for Skin Disease Classification

Belal Ahmad, Mohd Usama, Tanvir Ahmad, Adnan Saeed, Shabnam Khatoon, Long Hu

TL;DR

This work tackles fine-grained skin-disease classification by marrying a Bilinear Convolutional Neural Network (BCNN) with a Constrained Triplet Network (CTN). The BCNN captures rich spatial interactions through bilinear pooling, while the CTN imposes an online, matrix-similarity-driven joint loss to tighten intra-class and widen inter-class separation, formalized as $L_{joint} = \alpha_t L_{softmax} + (1 - \alpha_t) L_{Triplet}$. The approach uses an Xception backbone and a similarity-matrix-driven sampling strategy to optimize both embedding and classification, achieving a mean accuracy of $93.72\%$ on ISIC2019 and demonstrating improved discriminative feature learning over prior methods. The work provides a practical, end-to-end framework with interpretable results via Grad-CAM, offering potential benefits for automated dermatological screening and early skin cancer detection. Future directions include multi-branch loss integration and incorporating attention mechanisms to further refine fine-grained skin lesion discrimination.

Abstract

In this study, we proposed a model for skin disease classification using a Bilinear Convolutional Neural Network (BCNN) with a Constrained Triplet Network (CTN). BCNN can capture rich spatial interactions between features in image data. This computes the outer product of feature vectors from two different CNNs by a bilinear pooling. The resulting features encode second-order statistics, enabling the network to capture more complex relationships between different channels and spatial locations. The CTN employs the Triplet Loss Function (TLF) by using a new loss layer that is added at the end of the architecture called the Constrained Triplet Loss (CTL) layer. This is done to obtain two significant learning objectives: inter-class categorization and intra-class concentration with their deep features as often as possible, which can be effective for skin disease classification. The proposed model is trained to extract the intra-class features from a deep network and accordingly increases the distance between these features, improving the model's performance. The model achieved a mean accuracy of 93.72%.

Bilinear-Convolutional Neural Network Using a Matrix Similarity-based Joint Loss Function for Skin Disease Classification

TL;DR

. The approach uses an Xception backbone and a similarity-matrix-driven sampling strategy to optimize both embedding and classification, achieving a mean accuracy of

on ISIC2019 and demonstrating improved discriminative feature learning over prior methods. The work provides a practical, end-to-end framework with interpretable results via Grad-CAM, offering potential benefits for automated dermatological screening and early skin cancer detection. Future directions include multi-branch loss integration and incorporating attention mechanisms to further refine fine-grained skin lesion discrimination.

Abstract

Paper Structure (19 sections, 20 equations, 11 figures, 2 tables, 1 algorithm)

This paper contains 19 sections, 20 equations, 11 figures, 2 tables, 1 algorithm.

Introduction
Related Work
Background Knowledge
Xception
Bilinear Convolutional Neural Network
Constrained Triplet Network
Methodology
Joint Loss Function
Dataset and Evaluation Metrics
Dataset
Evaluation Metrics and Algorithm
Experiments and Results
Model Architecture
Input and Network Settings
Training Loss and Accuracy
...and 4 more sections

Figures (11)

Figure 1: The extreme version of strictly equivalent reformulation of the simplified Inception module, which has all the output channels of $[1 \times1]$ convolution with a spatial convolution.
Figure 2: Streams A and B of BCNN extract the features of inputs and use an outer-matrix product to combine the outputs at each location. Then, obtain the bilinear feature representation using average pooling. At the last bilinear feature, representation passes through the softmax layer to class prediction.
Figure 3: Triplet loss increases the distance between skin disease images of different categories and decreases the distance between the same categories of skin disease images.
Figure 4: Constrained triplet network structure. The number of batches passes through deep CNN, and skin disease image representation is generated using the L2-normalization layer. In the end, the triplet loss function uses image representation to recognize the pair images (same or different disease).
Figure 5: Proposed model architecture.
...and 6 more figures

Bilinear-Convolutional Neural Network Using a Matrix Similarity-based Joint Loss Function for Skin Disease Classification

TL;DR

Abstract

Bilinear-Convolutional Neural Network Using a Matrix Similarity-based Joint Loss Function for Skin Disease Classification

Authors

TL;DR

Abstract

Table of Contents

Figures (11)