Anti-Collapse Loss for Deep Metric Learning Based on Coding Rate Metric
Xiruo Jiang, Yazhou Yao, Xili Dai, Fumin Shen, Xian-Sheng Hua, Heng-Tao Shen
TL;DR
This work tackles embedding-space collapse in deep metric learning by introducing Anti-Collapse Loss, a rate-based objective inspired by Maximal Coding Rate Reduction. By maximizing the average coding rate of either all samples or proxies—and by replacing intra-class coding terms with a proxy-based formulation—the method preserves the global structure of the embedding space while reducing computational overhead. The approach integrates seamlessly with both pair-based and proxy-based DML methods and demonstrates state-of-the-art image retrieval performance on benchmarks like CUB200, Cars196, and SOP, with ablations confirming robustness and convergence advantages. Extensions to vision-language models (e.g., CLIP) show additional gains, highlighting the practical impact for scalable, generalizable metric learning in visual recognition tasks.
Abstract
Deep metric learning (DML) aims to learn a discriminative high-dimensional embedding space for downstream tasks like classification, clustering, and retrieval. Prior literature predominantly focuses on pair-based and proxy-based methods to maximize inter-class discrepancy and minimize intra-class diversity. However, these methods tend to suffer from the collapse of the embedding space due to their over-reliance on label information. This leads to sub-optimal feature representation and inferior model performance. To maintain the structure of embedding space and avoid feature collapse, we propose a novel loss function called Anti-Collapse Loss. Specifically, our proposed loss primarily draws inspiration from the principle of Maximal Coding Rate Reduction. It promotes the sparseness of feature clusters in the embedding space to prevent collapse by maximizing the average coding rate of sample features or class proxies. Moreover, we integrate our proposed loss with pair-based and proxy-based methods, resulting in notable performance improvement. Comprehensive experiments on benchmark datasets demonstrate that our proposed method outperforms existing state-of-the-art methods. Extensive ablation studies verify the effectiveness of our method in preventing embedding space collapse and promoting generalization performance.
