Divide, Cache, Conquer: Dichotomic Prompting for Efficient Multi-Label LLM-Based Classification
Mikołaj Langner, Jan Eliasz, Ewa Rudnicka, Jan Kocoń
TL;DR
This paper addresses efficient multi-label text classification in settings where the label space evolves over time. It introduces a dichotomic prompting framework that treats each label as an independent yes/no decision and leverages prefix caching to accelerate inference on decoder-only LLMs, with a distillation pipeline to train small language models from a high-capacity teacher. The authors show that dichotomic prompting achieves comparable accuracy to structured JSON prompts and yields superior zero-shot robustness for unseen labels, while delivering substantial speedups on short texts. They demonstrate these gains on a 10k Polish affective dataset covering $K=24$ dimensions, with four small models (HerBERT-Large, PLLuM-8B, CLARIN-1B, Gemma3-1B) trained via DeepSeek-V3 pseudo-labels. The framework is generalizable to other domains and languages, offering a scalable, cost-efficient solution for dynamic multi-label classification.
Abstract
We introduce a method for efficient multi-label text classification with large language models (LLMs), built on reformulating classification tasks as sequences of dichotomic (yes/no) decisions. Instead of generating all labels in a single structured response, each target dimension is queried independently, which, combined with a prefix caching mechanism, yields substantial efficiency gains for short-text inference without loss of accuracy. To demonstrate the approach, we focus on affective text analysis, covering 24 dimensions including emotions and sentiment. Using LLM-to-SLM distillation, a powerful annotator model (DeepSeek-V3) provides multiple annotations per text, which are aggregated to fine-tune smaller models (HerBERT-Large, CLARIN-1B, PLLuM-8B, Gemma3-1B). The fine-tuned models show significant improvements over zero-shot baselines, particularly on the dimensions seen during training. Our findings suggest that decomposing multi-label classification into dichotomic queries, combined with distillation and cache-aware inference, offers a scalable and effective framework for LLM-based classification. While we validate the method on affective states, the approach is general and applicable across domains.
