Table of Contents
Fetching ...

Prototypical Extreme Multi-label Classification with a Dynamic Margin Loss

Kunal Dahiya, Diego Ortego, David Jiménez

TL;DR

This paper proposes PRIME, a XMC method that employs a novel prototypical contrastive learning technique to reconcile efficiency and performance surpassing brute-force approaches, and achieves state-of-the-art results in several public benchmarks of different sizes and domains, while keeping the model efficient.

Abstract

Extreme Multi-label Classification (XMC) methods predict relevant labels for a given query in an extremely large label space. Recent works in XMC address this problem using deep encoders that project text descriptions to an embedding space suitable for recovering the closest labels. However, learning deep models can be computationally expensive in large output spaces, resulting in a trade-off between high performing brute-force approaches and efficient solutions. In this paper, we propose PRIME, a XMC method that employs a novel prototypical contrastive learning technique to reconcile efficiency and performance surpassing brute-force approaches. We frame XMC as a data-to-prototype prediction task where label prototypes aggregate information from related queries. More precisely, we use a shallow transformer encoder that we coin as Label Prototype Network, which enriches label representations by aggregating text-based embeddings, label centroids and learnable free vectors. We jointly train a deep encoder and the Label Prototype Network using an adaptive triplet loss objective that better adapts to the high granularity and ambiguity of extreme label spaces. PRIME achieves state-of-the-art results in several public benchmarks of different sizes and domains, while keeping the model efficient.

Prototypical Extreme Multi-label Classification with a Dynamic Margin Loss

TL;DR

This paper proposes PRIME, a XMC method that employs a novel prototypical contrastive learning technique to reconcile efficiency and performance surpassing brute-force approaches, and achieves state-of-the-art results in several public benchmarks of different sizes and domains, while keeping the model efficient.

Abstract

Extreme Multi-label Classification (XMC) methods predict relevant labels for a given query in an extremely large label space. Recent works in XMC address this problem using deep encoders that project text descriptions to an embedding space suitable for recovering the closest labels. However, learning deep models can be computationally expensive in large output spaces, resulting in a trade-off between high performing brute-force approaches and efficient solutions. In this paper, we propose PRIME, a XMC method that employs a novel prototypical contrastive learning technique to reconcile efficiency and performance surpassing brute-force approaches. We frame XMC as a data-to-prototype prediction task where label prototypes aggregate information from related queries. More precisely, we use a shallow transformer encoder that we coin as Label Prototype Network, which enriches label representations by aggregating text-based embeddings, label centroids and learnable free vectors. We jointly train a deep encoder and the Label Prototype Network using an adaptive triplet loss objective that better adapts to the high granularity and ambiguity of extreme label spaces. PRIME achieves state-of-the-art results in several public benchmarks of different sizes and domains, while keeping the model efficient.

Paper Structure

This paper contains 27 sections, 2 theorems, 16 equations, 3 figures, 11 tables.

Key Result

Proposition 1

Consider the non-differentiable piece-wise linear function defining the adaptive margin to be $m\left(s_{ap}, s_{an}\right) = | s_{ap} - s_{an}|$. Adding $m\left(s_{ap}, s_{an}\right)$ into Eq. eq:app_triplet_ang expands the support of the function by relaxing the margin constraint of the original

Figures (3)

  • Figure 1: Performance vs efficiency comparison for several encoder-only methods in LF-AmazonTitles-1.3M dataset and our PRIME proposal. Blob size represents the models' batch size, y-axis performance, and x-axis number of negatives per query. Note that different versions of DEXML and PRIME vary the negative pool and the batch size, which dominate the method's efficiency.
  • Figure 2: Overview of PRIME (PRototypIcal extreME multi-label classification), which exploits query-to-prototype, query-to-label and label-to-query relations.
  • Figure 3: Impact of key components of the Label Prototype Network, i.e., centroids ($\vc_l$) and free vectors ($\vv_l$). For convenience, we report results for PRIME-lite, single positive and half batch size variant of PRIME.

Theorems & Definitions (4)

  • Proposition 1
  • Proposition 2
  • proof
  • proof