Table of Contents
Fetching ...

Prompt Tuned Embedding Classification for Multi-Label Industry Sector Allocation

Valentin Leonhard Buchner, Lele Cao, Jan-Christoph Kalo, Vilhelm von Ehrenheim

TL;DR

This work tackles multi-label industry sector allocation with limited labeled data and dynamic taxonomies by introducing Prompt Tuned Embedding Classification (PTEC), a hybrid method that combines a soft prompt with an embedding-based classification head. To address T2T-based limitations in multi-label settings, the authors also integrate constrained decoding via Trie Search. Through extensive experiments on a proprietary IndustrySector dataset and a HateSpeech benchmark, PTEC achieves higher macro F1 scores with lower inference costs than baselines and provides independent confidence scores for labels, enabling thresholding and ranking. The findings suggest that PTEC generalizes beyond well-known companies, reduces catastrophic forgetting, and can be scaled for production deployment in domain-specific classification tasks. The authors also share their codebase and a benchmarking dataset to support reproducibility and broader evaluation.

Abstract

Prompt Tuning is emerging as a scalable and cost-effective method to fine-tune Pretrained Language Models (PLMs), which are often referred to as Large Language Models (LLMs). This study benchmarks the performance and computational efficiency of Prompt Tuning and baselines for multi-label text classification. This is applied to the challenging task of classifying companies into an investment firm's proprietary industry taxonomy, supporting their thematic investment strategy. Text-to-text classification is frequently reported to outperform task-specific classification heads, but has several limitations when applied to a multi-label classification problem where each label consists of multiple tokens: (a) Generated labels may not match any label in the label taxonomy; (b) The fine-tuning process lacks permutation invariance and is sensitive to the order of the provided labels; (c) The model provides binary decisions rather than appropriate confidence scores. Limitation (a) is addressed by applying constrained decoding using Trie Search, which slightly improves classification performance. All limitations (a), (b), and (c) are addressed by replacing the PLM's language head with a classification head, which is referred to as Prompt Tuned Embedding Classification (PTEC). This improves performance significantly, while also reducing computational costs during inference. In our industrial application, the training data is skewed towards well-known companies. We confirm that the model's performance is consistent across both well-known and less-known companies. Our overall results indicate the continuing need to adapt state-of-the-art methods to domain-specific tasks, even in the era of PLMs with strong generalization abilities. We release our codebase and a benchmarking dataset at https://github.com/EQTPartners/PTEC.

Prompt Tuned Embedding Classification for Multi-Label Industry Sector Allocation

TL;DR

This work tackles multi-label industry sector allocation with limited labeled data and dynamic taxonomies by introducing Prompt Tuned Embedding Classification (PTEC), a hybrid method that combines a soft prompt with an embedding-based classification head. To address T2T-based limitations in multi-label settings, the authors also integrate constrained decoding via Trie Search. Through extensive experiments on a proprietary IndustrySector dataset and a HateSpeech benchmark, PTEC achieves higher macro F1 scores with lower inference costs than baselines and provides independent confidence scores for labels, enabling thresholding and ranking. The findings suggest that PTEC generalizes beyond well-known companies, reduces catastrophic forgetting, and can be scaled for production deployment in domain-specific classification tasks. The authors also share their codebase and a benchmarking dataset to support reproducibility and broader evaluation.

Abstract

Prompt Tuning is emerging as a scalable and cost-effective method to fine-tune Pretrained Language Models (PLMs), which are often referred to as Large Language Models (LLMs). This study benchmarks the performance and computational efficiency of Prompt Tuning and baselines for multi-label text classification. This is applied to the challenging task of classifying companies into an investment firm's proprietary industry taxonomy, supporting their thematic investment strategy. Text-to-text classification is frequently reported to outperform task-specific classification heads, but has several limitations when applied to a multi-label classification problem where each label consists of multiple tokens: (a) Generated labels may not match any label in the label taxonomy; (b) The fine-tuning process lacks permutation invariance and is sensitive to the order of the provided labels; (c) The model provides binary decisions rather than appropriate confidence scores. Limitation (a) is addressed by applying constrained decoding using Trie Search, which slightly improves classification performance. All limitations (a), (b), and (c) are addressed by replacing the PLM's language head with a classification head, which is referred to as Prompt Tuned Embedding Classification (PTEC). This improves performance significantly, while also reducing computational costs during inference. In our industrial application, the training data is skewed towards well-known companies. We confirm that the model's performance is consistent across both well-known and less-known companies. Our overall results indicate the continuing need to adapt state-of-the-art methods to domain-specific tasks, even in the era of PLMs with strong generalization abilities. We release our codebase and a benchmarking dataset at https://github.com/EQTPartners/PTEC.
Paper Structure (26 sections, 3 equations, 4 figures, 5 tables)

This paper contains 26 sections, 3 equations, 4 figures, 5 tables.

Figures (4)

  • Figure 1: Schematic overview of Prompt Tuning, showing the trainable soft prompt (matrix $SP_\theta$), the tokenized and embedded input text ($X_\text{input}$), and the LLM with frozen parameters ($LLM_\phi$).
  • Figure 2: A schematic comparison of Prompt Tuning with T2T classification (PT + T2T), Prompt Tuning with Trie Search (PT + TS), and PTEC. Note that Healthcare Software would not be a valid label name, while Healthcare IT would be.
  • Figure 3: ROC curves using LLaMa 7B. Methods that cannot be thresholded are displayed as individual points. AUROC = Area Under the ROC curve. Other abbreviations as defined in Fig. \ref{['fig:comparison']} and Table \ref{['table:perf_flops']}.
  • Figure 4: Distributions of (a) original description lengths, (b) preprocessed description lengths, (c) number of labels per example, and (d) number of examples per label