In-Context Learning for Extreme Multi-Label Classification
Karel D'Oosterlinck, Omar Khattab, François Remy, Thomas Demeester, Chris Develder, Christopher Potts
TL;DR
This work tackles extreme multi-label classification (XMC) with thousands of labels by proposing Infer--Retrieve--Rank (IReRa), a modular in-context learning program implemented in the DSPy framework. IReRa orchestrates a three-step pipeline: infer query terms from the input, retrieve candidate labels via a frozen retriever, and rank the retrieved labels with a second LM, all without fine-tuning. By bootstrapping few-shot prompts from a small labeled set and optimizing the modules per dataset, IReRa achieves state-of-the-art results on ESCO-based job datasets (HOUSE, TECH, TECHWOLF) and competitive performance on BioDEX, using only tens of labeled examples and open-source components. The approach reduces prompt engineering, scales to new tasks, and offers a cost-effective, generalizable solution for large-label problems, albeit with GPT-4 reliance that motivates future work toward cheaper, fully open-source variants.
Abstract
Multi-label classification problems with thousands of classes are hard to solve with in-context learning alone, as language models (LMs) might lack prior knowledge about the precise classes or how to assign them, and it is generally infeasible to demonstrate every class in a prompt. We propose a general program, $\texttt{Infer--Retrieve--Rank}$, that defines multi-step interactions between LMs and retrievers to efficiently tackle such problems. We implement this program using the $\texttt{DSPy}$ programming model, which specifies in-context systems in a declarative manner, and use $\texttt{DSPy}$ optimizers to tune it towards specific datasets by bootstrapping only tens of few-shot examples. Our primary extreme classification program, optimized separately for each task, attains state-of-the-art results across three benchmarks (HOUSE, TECH, TECHWOLF). We apply the same program to a benchmark with vastly different characteristics and attain competitive performance as well (BioDEX). Unlike prior work, our proposed solution requires no finetuning, is easily applicable to new tasks, alleviates prompt engineering, and requires only tens of labeled examples. Our code is public at https://github.com/KarelDO/xmc.dspy.
