In-Context Learning for Extreme Multi-Label Classification

Karel D'Oosterlinck; Omar Khattab; François Remy; Thomas Demeester; Chris Develder; Christopher Potts

In-Context Learning for Extreme Multi-Label Classification

Karel D'Oosterlinck, Omar Khattab, François Remy, Thomas Demeester, Chris Develder, Christopher Potts

TL;DR

This work tackles extreme multi-label classification (XMC) with thousands of labels by proposing Infer--Retrieve--Rank (IReRa), a modular in-context learning program implemented in the DSPy framework. IReRa orchestrates a three-step pipeline: infer query terms from the input, retrieve candidate labels via a frozen retriever, and rank the retrieved labels with a second LM, all without fine-tuning. By bootstrapping few-shot prompts from a small labeled set and optimizing the modules per dataset, IReRa achieves state-of-the-art results on ESCO-based job datasets (HOUSE, TECH, TECHWOLF) and competitive performance on BioDEX, using only tens of labeled examples and open-source components. The approach reduces prompt engineering, scales to new tasks, and offers a cost-effective, generalizable solution for large-label problems, albeit with GPT-4 reliance that motivates future work toward cheaper, fully open-source variants.

Abstract

Multi-label classification problems with thousands of classes are hard to solve with in-context learning alone, as language models (LMs) might lack prior knowledge about the precise classes or how to assign them, and it is generally infeasible to demonstrate every class in a prompt. We propose a general program, $\texttt{Infer--Retrieve--Rank}$, that defines multi-step interactions between LMs and retrievers to efficiently tackle such problems. We implement this program using the $\texttt{DSPy}$ programming model, which specifies in-context systems in a declarative manner, and use $\texttt{DSPy}$ optimizers to tune it towards specific datasets by bootstrapping only tens of few-shot examples. Our primary extreme classification program, optimized separately for each task, attains state-of-the-art results across three benchmarks (HOUSE, TECH, TECHWOLF). We apply the same program to a benchmark with vastly different characteristics and attain competitive performance as well (BioDEX). Unlike prior work, our proposed solution requires no finetuning, is easily applicable to new tasks, alleviates prompt engineering, and requires only tens of labeled examples. Our code is public at https://github.com/KarelDO/xmc.dspy.

In-Context Learning for Extreme Multi-Label Classification

TL;DR

Abstract

, that defines multi-step interactions between LMs and retrievers to efficiently tackle such problems. We implement this program using the

programming model, which specifies in-context systems in a declarative manner, and use

optimizers to tune it towards specific datasets by bootstrapping only tens of few-shot examples. Our primary extreme classification program, optimized separately for each task, attains state-of-the-art results across three benchmarks (HOUSE, TECH, TECHWOLF). We apply the same program to a benchmark with vastly different characteristics and attain competitive performance as well (BioDEX). Unlike prior work, our proposed solution requires no finetuning, is easily applicable to new tasks, alleviates prompt engineering, and requires only tens of labeled examples. Our code is public at https://github.com/KarelDO/xmc.dspy.

Paper Structure (13 sections, 2 equations, 1 figure, 2 tables)

This paper contains 13 sections, 2 equations, 1 figure, 2 tables.

Introduction
Related Work
Infer--Retrieve--Rank
Seed-prompts
Metrics
Data
BioDEX:
ESCO:
Experiments and Results
Baselines
Infer--Retrieve--Rank
Program Cost Breakdown
Conclusion

Figures (1)

Figure 1: We propose Infer-Retrieve-Rank, an efficient in-context learning program for multi-label classification with an extreme amount of classes ($\geq$ 10,000). Given an input, a first in-context learning module predicts queries which route to a frozen retriever. The retrieved documents are re-ranked by a second in-context module (Step 1). Given a minimal prompt (Step 2), a zero-shot Teacher LM bootstraps demonstrations to optimize the few-shot Student LM (Step 3). Optimization using $\approx$50 labeled inputs can yield state-of-the-art results, using only $\approx$20 Teacher and $\approx$1,500 Student calls. The (optimization) logic is expressed using the DSPy programming model.

In-Context Learning for Extreme Multi-Label Classification

TL;DR

Abstract

In-Context Learning for Extreme Multi-Label Classification

Authors

TL;DR

Abstract

Table of Contents

Figures (1)