Class Distance Weighted Cross Entropy Loss for Classification of Disease Severity
Gorkem Polat, Ümit Mert Çağlar, Alptekin Temizel
TL;DR
This work introduces Class Distance Weighted Cross Entropy (CDW-CE), an ordinal loss that penalizes misclassifications more severely as the distance from the true class increases, with an optional margin to enforce tighter intra-class clustering. Evaluated on the LIMUC Ulcerative Colitis dataset across three CNN architectures, CDW-CE consistently surpasses standard CE and other ordinal losses in MES classification and in remission prediction, evidenced by higher QWK, F1, accuracy, MAE, and AUC, as well as improved CAM explainability and latent-space clustering (t-SNE/UMAP). The paper also analyzes the impact of the distance exponent $\alpha$ and margin $m$, showing that higher $\alpha$ values (e.g., 5–7) and margin terms improve performance, with domain experts confirming better alignment of model attention to clinical symptoms. Overall, CDW-CE provides a robust, explainable ordinal loss that enhances both predictive power and clinical interpretability in disease severity classification.
Abstract
Assessing disease severity with ordinal classes, where each class reflects increasing severity levels, benefits from loss functions designed for this ordinal structure. Traditional categorical loss functions, like Cross-Entropy (CE), often perform suboptimally in these scenarios. To address this, we propose a novel loss function, Class Distance Weighted Cross-Entropy (CDW-CE), which penalizes misclassifications more severely when the predicted and actual classes are farther apart. We evaluated CDW-CE using various deep architectures, comparing its performance against several categorical and ordinal loss functions. To assess the quality of latent representations, we used t-distributed stochastic neighbor embedding (t-SNE) and uniform manifold approximation and projection (UMAP) visualizations, quantified the clustering quality using the Silhouette Score, and compared Class Activation Maps (CAM) generated by models trained with CDW-CE and CE loss. Feedback from domain experts was incorporated to evaluate how well model attention aligns with expert opinion. Our results show that CDW-CE consistently improves performance in ordinal image classification tasks. It achieves higher Silhouette Scores, indicating better class discrimination capability, and its CAM visualizations show a stronger focus on clinically significant regions, as validated by domain experts. Receiver operator characteristics (ROC) curves and the area under the curve (AUC) scores highlight that CDW-CE outperforms other loss functions, including prominent ordinal loss functions from the literature.
