Jaccard Metric Losses: Optimizing the Jaccard Index with Soft Labels

Zifu Wang; Xuefei Ning; Matthew B. Blaschko

Jaccard Metric Losses: Optimizing the Jaccard Index with Soft Labels

Zifu Wang, Xuefei Ning, Matthew B. Blaschko

TL;DR

The paper tackles the mismatch between IoU losses and soft-label training in semantic segmentation by introducing Jaccard Metric Losses (JMLs), two differentiable metrics that replicate soft Jaccard behavior on hard labels while supporting soft labels. The authors show that JMLs are true metrics on the [0,1]^p hypercube, enabling stable optimization and meaningful triangle inequalities, which underpin effective knowledge distillation and calibration. They integrate JMLs with boundary label smoothing (BLS) and propose active-class filtering for KD and SSL, achieving significant accuracy and calibration improvements across Cityscapes, PASCAL VOC, ADE20K, and DeepGlobe with 13 architectures. Empirically, JMLs outperform state-of-the-art KD and SSL methods in segmentation while sometimes altering calibration in nuanced ways, which can be mitigated via JML-BLS and JML-KD. The work provides practical guidelines for hyperparameters and usage, and releases code for broad adoption.

Abstract

Intersection over Union (IoU) losses are surrogates that directly optimize the Jaccard index. Leveraging IoU losses as part of the loss function have demonstrated superior performance in semantic segmentation tasks compared to optimizing pixel-wise losses such as the cross-entropy loss alone. However, we identify a lack of flexibility in these losses to support vital training techniques like label smoothing, knowledge distillation, and semi-supervised learning, mainly due to their inability to process soft labels. To address this, we introduce Jaccard Metric Losses (JMLs), which are identical to the soft Jaccard loss in standard settings with hard labels but are fully compatible with soft labels. We apply JMLs to three prominent use cases of soft labels: label smoothing, knowledge distillation and semi-supervised learning, and demonstrate their potential to enhance model accuracy and calibration. Our experiments show consistent improvements over the cross-entropy loss across 4 semantic segmentation datasets (Cityscapes, PASCAL VOC, ADE20K, DeepGlobe Land) and 13 architectures, including classic CNNs and recent vision transformers. Remarkably, our straightforward approach significantly outperforms state-of-the-art knowledge distillation and semi-supervised learning methods. The code is available at \href{https://github.com/zifuwanggg/JDTLosses}{https://github.com/zifuwanggg/JDTLosses}.

Jaccard Metric Losses: Optimizing the Jaccard Index with Soft Labels

TL;DR

Abstract

Paper Structure (33 sections, 4 theorems, 26 equations, 8 figures, 10 tables)

This paper contains 33 sections, 4 theorems, 26 equations, 8 figures, 10 tables.

Introduction
Methods
Preliminaries
The Limitation of Existing IoU Losses
Jaccard Metric Losses
Use Cases
Label Smoothing
Knowledge Distillation
Experiments
Results on Accuracy
Results on Calibration
Ablation Studies
JML Weights
JML-BLS
JML-KD
...and 18 more sections

Key Result

Theorem 2.1

Both $\overline{\Delta}_{\text{JML1}}$ and $\overline{\Delta}_{\text{JML2}}$ are metrics on $[0,1]^p$. Neither $\overline{\Delta}_{\text{SJL},L^1}$ nor $\overline{\Delta}_{\text{SJL},L^2}$ is a metric on $[0,1]^p$.

Figures (8)

Figure 1: Loss value vs. prediction with $y=0.1$ (left) and $y=0.9$ (right).
Figure 2: Comparing $\overline{\Delta}_{\text{JML1}}, \overline{\Delta}_{\text{JML2}}$ and the convex closure with $y=0.5$.
Figure 3: A counterexample that both $\overline{\Delta}_{\text{JML1}}$ and $\overline{\Delta}_{\text{JML2}}$ are not piece-wise concave in 2D.
Figure 4: The best $\epsilon$ and mIoU (%) for different $k$ on PASCAL VOC using DL3-R18.
Figure 5: Effects of $\epsilon$ on PASCAL VOC using DL3-R101/50/18. $\epsilon=0$ is the baseline (no smoothing). The highest and the lowest mean values are highlighted in red and green horizontal lines, respectively.
...and 3 more figures

Theorems & Definitions (10)

Theorem 2.1
Definition 2.2: Metric EncyclopediaDeza2009
Theorem 2.3
proof
proof
Definition H.1: Convex Closure
Theorem H.2
proof
Theorem H.3
proof

Jaccard Metric Losses: Optimizing the Jaccard Index with Soft Labels

TL;DR

Abstract

Jaccard Metric Losses: Optimizing the Jaccard Index with Soft Labels

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (8)

Theorems & Definitions (10)