Complex Wavelet Mutual Information Loss: A Multi-Scale Loss Function for Semantic Segmentation
Renhao Lu
TL;DR
This work addresses semantic segmentation under class and instance imbalance by introducing CWMI loss, which leverages a complex steerable pyramid to extract multiscale, directional features and mutual information across subbands. By combining cross-entropy with a frequency-domain MI term, CWMI captures high-dimensional dependencies and phase information essential for preserving boundaries and topology, while maintaining computational efficiency. Extensive experiments across four diverse datasets and multiple architectures show CWMI consistently improves pixel-wise and topological metrics, often with statistical significance, and a real-valued variant confirms the importance of phase information. The proposed method offers a practical, scalable loss that enhances segmentation performance in challenging scenarios and may extend to related tasks such as medical imaging, satellite analysis, and image-to-image translation.
Abstract
Recent advancements in deep neural networks have significantly enhanced the performance of semantic segmentation. However, class imbalance and instance imbalance remain persistent challenges, where smaller instances and thin boundaries are often overshadowed by larger structures. To address the multiscale nature of segmented objects, various models have incorporated mechanisms such as spatial attention and feature pyramid networks. Despite these advancements, most loss functions are still primarily pixel-wise, while regional and boundary-focused loss functions often incur high computational costs or are restricted to small-scale regions. To address this limitation, we propose the complex wavelet mutual information (CWMI) loss, a novel loss function that leverages mutual information from subband images decomposed by a complex steerable pyramid. The complex steerable pyramid captures features across multiple orientations and preserves structural similarity across scales. Meanwhile, mutual information is well-suited to capturing high-dimensional directional features and offers greater noise robustness. Extensive experiments on diverse segmentation datasets demonstrate that CWMI loss achieves significant improvements in both pixel-wise accuracy and topological metrics compared to state-of-the-art methods, while introducing minimal computational overhead. Our code is available at https://github.com/lurenhaothu/CWMI
