Table of Contents
Fetching ...

Anti-Collapse Loss for Deep Metric Learning Based on Coding Rate Metric

Xiruo Jiang, Yazhou Yao, Xili Dai, Fumin Shen, Xian-Sheng Hua, Heng-Tao Shen

TL;DR

This work tackles embedding-space collapse in deep metric learning by introducing Anti-Collapse Loss, a rate-based objective inspired by Maximal Coding Rate Reduction. By maximizing the average coding rate of either all samples or proxies—and by replacing intra-class coding terms with a proxy-based formulation—the method preserves the global structure of the embedding space while reducing computational overhead. The approach integrates seamlessly with both pair-based and proxy-based DML methods and demonstrates state-of-the-art image retrieval performance on benchmarks like CUB200, Cars196, and SOP, with ablations confirming robustness and convergence advantages. Extensions to vision-language models (e.g., CLIP) show additional gains, highlighting the practical impact for scalable, generalizable metric learning in visual recognition tasks.

Abstract

Deep metric learning (DML) aims to learn a discriminative high-dimensional embedding space for downstream tasks like classification, clustering, and retrieval. Prior literature predominantly focuses on pair-based and proxy-based methods to maximize inter-class discrepancy and minimize intra-class diversity. However, these methods tend to suffer from the collapse of the embedding space due to their over-reliance on label information. This leads to sub-optimal feature representation and inferior model performance. To maintain the structure of embedding space and avoid feature collapse, we propose a novel loss function called Anti-Collapse Loss. Specifically, our proposed loss primarily draws inspiration from the principle of Maximal Coding Rate Reduction. It promotes the sparseness of feature clusters in the embedding space to prevent collapse by maximizing the average coding rate of sample features or class proxies. Moreover, we integrate our proposed loss with pair-based and proxy-based methods, resulting in notable performance improvement. Comprehensive experiments on benchmark datasets demonstrate that our proposed method outperforms existing state-of-the-art methods. Extensive ablation studies verify the effectiveness of our method in preventing embedding space collapse and promoting generalization performance.

Anti-Collapse Loss for Deep Metric Learning Based on Coding Rate Metric

TL;DR

This work tackles embedding-space collapse in deep metric learning by introducing Anti-Collapse Loss, a rate-based objective inspired by Maximal Coding Rate Reduction. By maximizing the average coding rate of either all samples or proxies—and by replacing intra-class coding terms with a proxy-based formulation—the method preserves the global structure of the embedding space while reducing computational overhead. The approach integrates seamlessly with both pair-based and proxy-based DML methods and demonstrates state-of-the-art image retrieval performance on benchmarks like CUB200, Cars196, and SOP, with ablations confirming robustness and convergence advantages. Extensions to vision-language models (e.g., CLIP) show additional gains, highlighting the practical impact for scalable, generalizable metric learning in visual recognition tasks.

Abstract

Deep metric learning (DML) aims to learn a discriminative high-dimensional embedding space for downstream tasks like classification, clustering, and retrieval. Prior literature predominantly focuses on pair-based and proxy-based methods to maximize inter-class discrepancy and minimize intra-class diversity. However, these methods tend to suffer from the collapse of the embedding space due to their over-reliance on label information. This leads to sub-optimal feature representation and inferior model performance. To maintain the structure of embedding space and avoid feature collapse, we propose a novel loss function called Anti-Collapse Loss. Specifically, our proposed loss primarily draws inspiration from the principle of Maximal Coding Rate Reduction. It promotes the sparseness of feature clusters in the embedding space to prevent collapse by maximizing the average coding rate of sample features or class proxies. Moreover, we integrate our proposed loss with pair-based and proxy-based methods, resulting in notable performance improvement. Comprehensive experiments on benchmark datasets demonstrate that our proposed method outperforms existing state-of-the-art methods. Extensive ablation studies verify the effectiveness of our method in preventing embedding space collapse and promoting generalization performance.
Paper Structure (17 sections, 9 equations, 7 figures, 6 tables)

This paper contains 17 sections, 9 equations, 7 figures, 6 tables.

Figures (7)

  • Figure 1: The upper half of the figure illustrates the collapse issue in the embedding space caused by the over-reliance on labels in existing pair-based and proxy-based methods (the sector area represents the coding rate metric of the embedding space). The lower-left subfigure demonstrates the principle of Maximal Coding Rate Reduction (MCR$^2$), which utilizes all data samples to maintain the structure of the embedding space. In contrast, our proposed Anti-Collapse Loss prevents the collapse of embedding space by maximizing the proxy coding rate.
  • Figure 2: During training, existing proxy-based and pair-based methods require labels to divide samples into positive and negative pairs. These methods rarely utilize the global information of all samples or proxies that have been stripped of label information. The lack of global information leads to the collapse of embedding space during training. To address the collapse issue, our Anti-Collapse Loss maximizes the coding rate to continuously maintain the encoding rate of the proxies during training. Our method fully leverages the characteristic of proxies guiding the sample positions, preventing the collapse of the embedding space while conserving computational resources.
  • Figure 3: The coding rate variation of three proxy-based losses during training.
  • Figure 4: The histogram of sample similarity distribution for proxy-based loss in achieving optimal image retrieval performance (Max Recall@1).
  • Figure 5: Proxy similarity matrix of different proxy-based losses in achieving optimal image retrieval performance.
  • ...and 2 more figures