Learning label-label correlations in Extreme Multi-label Classification via Label Features

Siddhant Kharbanda; Devaansh Gupta; Erik Schultheis; Atmadeep Banerjee; Cho-Jui Hsieh; Rohit Babbar

Learning label-label correlations in Extreme Multi-label Classification via Label Features

Siddhant Kharbanda, Devaansh Gupta, Erik Schultheis, Atmadeep Banerjee, Cho-Jui Hsieh, Rohit Babbar

TL;DR

Gandalf addresses the data scarcity and tail-label problem in short-text Extreme Multi-label Classification by leveraging label-label correlations through a label co-occurrence graph and label features to generate surrogate training data. It deploys a data-centric augmentation that augments the standard dataset with soft targets derived from label co-occurrence, enabling training across existing models without increasing inference cost. Empirically, Gandalf yields average improvements of roughly 5% across multiple state-of-the-art XMC methods and public benchmarks, with larger gains for tail labels and in denser label settings; some cases show up to 30% improvement. The approach connects to GLaS regularization as a bias-variance trade-off, demonstrates strong plug-and-play compatibility, and offers practical benefits for applications such as search ads, product recommendations, and related query prediction.

Abstract

Extreme Multi-label Text Classification (XMC) involves learning a classifier that can assign an input with a subset of most relevant labels from millions of label choices. Recent works in this domain have increasingly focused on a symmetric problem setting where both input instances and label features are short-text in nature. Short-text XMC with label features has found numerous applications in areas such as query-to-ad-phrase matching in search ads, title-based product recommendation, prediction of related searches. In this paper, we propose Gandalf, a novel approach which makes use of a label co-occurrence graph to leverage label features as additional data points to supplement the training distribution. By exploiting the characteristics of the short-text XMC problem, it leverages the label features to construct valid training instances, and uses the label graph for generating the corresponding soft-label targets, hence effectively capturing the label-label correlations. Surprisingly, models trained on these new training instances, although being less than half of the original dataset, can outperform models trained on the original dataset, particularly on the PSP@k metric for tail labels. With this insight, we aim to train existing XMC algorithms on both, the original and new training instances, leading to an average 5% relative improvements for 6 state-of-the-art algorithms across 4 benchmark datasets consisting of up to 1.3M labels. Gandalf can be applied in a plug-and-play manner to various methods and thus forwards the state-of-the-art in the domain, without incurring any additional computational overheads.

Learning label-label correlations in Extreme Multi-label Classification via Label Features

TL;DR

Abstract

Paper Structure (35 sections, 10 equations, 5 figures, 7 tables)

This paper contains 35 sections, 10 equations, 5 figures, 7 tables.

Introduction
Label features and label co-occurrence
Contributions
Preliminaries
One-vs-All Classification (OvA)
Label Features
Label correlations
Gandalf: Learning From Label-Label Correlations
Bias-Variance Trade-off
Connection to GLaS regularization
Experiments
Benchmarks, Baseline and Metrics
Main Results
Gandalf vs Architectural Additions (LTE, GALE)
Gandalf vs Siamese Learning
...and 20 more sections

Figures (5)

Figure 1: Gandalf augments the training dataset $\mathcal{D}$ by generating soft targets for each label based on label co-occurrence statistics. These additional datapoints $\mathcal{Z}$ are simply concatenated to the traditional dataset for training.
Figure 2: Gandalf demonstrating improvements on the P@5 metric across various methods, separated into tail, torso and head lables. On the x axis, the middle row indicates the number of labels in the bin, and the lowest row denotes the average number of positives per label in that bin. Improvements in earlier bins (5 - 3) denote gains in tail label performance.
Figure 3: The (a) P@1 and (b) PSP@5 metric plotted against iterations for InceptionXML with and without Gandalf. The effect of subsampling labels for Gandalf on the (c) P@1 and (d) PSP@5 metric. Both results are on the LF-AmazonTitles-131K dataset.
Figure 4: Contributions to P@5 in LF-WikiSeeAlsoTitles-320K. The number of labels in each bin is provided after the # in the second row of the tags on the x-axis. The bottomost row denotes the mean label frequency in that bin. Specifically, note the improvements on tail labels in the earlier bins (5 - 3).
Figure 5: Correlations between labels and their first-order neighbours, as found by the label co-occurrence on the LF-WikiTitles-500K dataset. The legend shows the label in question, the bar chart shows the degree of correlation with its neighbouring labels. Correlated labels often share tokens with each other and/or may be used in the same context.

Learning label-label correlations in Extreme Multi-label Classification via Label Features

TL;DR

Abstract

Learning label-label correlations in Extreme Multi-label Classification via Label Features

Authors

TL;DR

Abstract

Table of Contents

Figures (5)