Table of Contents
Fetching ...

Complex Wavelet Mutual Information Loss: A Multi-Scale Loss Function for Semantic Segmentation

Renhao Lu

TL;DR

This work addresses semantic segmentation under class and instance imbalance by introducing CWMI loss, which leverages a complex steerable pyramid to extract multiscale, directional features and mutual information across subbands. By combining cross-entropy with a frequency-domain MI term, CWMI captures high-dimensional dependencies and phase information essential for preserving boundaries and topology, while maintaining computational efficiency. Extensive experiments across four diverse datasets and multiple architectures show CWMI consistently improves pixel-wise and topological metrics, often with statistical significance, and a real-valued variant confirms the importance of phase information. The proposed method offers a practical, scalable loss that enhances segmentation performance in challenging scenarios and may extend to related tasks such as medical imaging, satellite analysis, and image-to-image translation.

Abstract

Recent advancements in deep neural networks have significantly enhanced the performance of semantic segmentation. However, class imbalance and instance imbalance remain persistent challenges, where smaller instances and thin boundaries are often overshadowed by larger structures. To address the multiscale nature of segmented objects, various models have incorporated mechanisms such as spatial attention and feature pyramid networks. Despite these advancements, most loss functions are still primarily pixel-wise, while regional and boundary-focused loss functions often incur high computational costs or are restricted to small-scale regions. To address this limitation, we propose the complex wavelet mutual information (CWMI) loss, a novel loss function that leverages mutual information from subband images decomposed by a complex steerable pyramid. The complex steerable pyramid captures features across multiple orientations and preserves structural similarity across scales. Meanwhile, mutual information is well-suited to capturing high-dimensional directional features and offers greater noise robustness. Extensive experiments on diverse segmentation datasets demonstrate that CWMI loss achieves significant improvements in both pixel-wise accuracy and topological metrics compared to state-of-the-art methods, while introducing minimal computational overhead. Our code is available at https://github.com/lurenhaothu/CWMI

Complex Wavelet Mutual Information Loss: A Multi-Scale Loss Function for Semantic Segmentation

TL;DR

This work addresses semantic segmentation under class and instance imbalance by introducing CWMI loss, which leverages a complex steerable pyramid to extract multiscale, directional features and mutual information across subbands. By combining cross-entropy with a frequency-domain MI term, CWMI captures high-dimensional dependencies and phase information essential for preserving boundaries and topology, while maintaining computational efficiency. Extensive experiments across four diverse datasets and multiple architectures show CWMI consistently improves pixel-wise and topological metrics, often with statistical significance, and a real-valued variant confirms the importance of phase information. The proposed method offers a practical, scalable loss that enhances segmentation performance in challenging scenarios and may extend to related tasks such as medical imaging, satellite analysis, and image-to-image translation.

Abstract

Recent advancements in deep neural networks have significantly enhanced the performance of semantic segmentation. However, class imbalance and instance imbalance remain persistent challenges, where smaller instances and thin boundaries are often overshadowed by larger structures. To address the multiscale nature of segmented objects, various models have incorporated mechanisms such as spatial attention and feature pyramid networks. Despite these advancements, most loss functions are still primarily pixel-wise, while regional and boundary-focused loss functions often incur high computational costs or are restricted to small-scale regions. To address this limitation, we propose the complex wavelet mutual information (CWMI) loss, a novel loss function that leverages mutual information from subband images decomposed by a complex steerable pyramid. The complex steerable pyramid captures features across multiple orientations and preserves structural similarity across scales. Meanwhile, mutual information is well-suited to capturing high-dimensional directional features and offers greater noise robustness. Extensive experiments on diverse segmentation datasets demonstrate that CWMI loss achieves significant improvements in both pixel-wise accuracy and topological metrics compared to state-of-the-art methods, while introducing minimal computational overhead. Our code is available at https://github.com/lurenhaothu/CWMI

Paper Structure

This paper contains 24 sections, 13 equations, 6 figures, 6 tables.

Figures (6)

  • Figure 1: Illustration of the proposed Complex Wavelet Mutual Information (CWMI) Loss. The prediction and label images are decomposed using a complex steerable pyramid, which generates subbands at different scales and orientations. Mutual information is calculated for each corresponding pair of subbands, and the CWMI is computed as the sum of these mutual information values. $\mathbf{Y}_{B_n},\mathbf{P}_{B_n}$: complex steerable decomposition of label and prediction image at level $n$; $I(\mathbf{Y}_{B_n}, \mathbf{Y}_{B_n})$: mutual information between $\mathbf{Y}_{B_n}$ and $\mathbf{P}_{B_n}$
  • Figure 2: Steerable pyramid and complex steerable pyramid. (A) Orientation-selective band-pass filters of the steerable pyramid. (B) Example decomposition using a steerable pyramid with N=3, K=4. (C) Band-pass filters of the complex steerable pyramid, where negative frequency components are discarded. (D) Phase representation of the complex steerable pyramid output, with the real part identical to that of the steerable pyramid.
  • Figure 3: Qualitative results of different loss functions on the SNEMI3D dataset. Red: false positive regions;Blue: false negative regions.Green arrow: challenging false positive and Orange arrow: challenging false negative that are successfully addressed by CWMI.
  • Figure 4: Qualitative results of different loss functions on the GlaS dataset.
  • Figure 5: Qualitative results of different loss functions on the DRIVE dataset.
  • ...and 1 more figures