Table of Contents
Fetching ...

LLM Chain Ensembles for Scalable and Accurate Data Annotation

David Farr, Nico Manzonelli, Iain Cruickshank, Kate Starbird, Jevin West

TL;DR

An LLM chain ensemble methodology that aligns multiple LLMs in a sequence, routing data subsets to subsequent models based on classification uncertainty, making LLM chain ensembles a practical and efficient solution for large-scale data annotation challenges.

Abstract

The ability of large language models (LLMs) to perform zero-shot classification makes them viable solutions for data annotation in rapidly evolving domains where quality labeled data is often scarce and costly to obtain. However, the large-scale deployment of LLMs can be prohibitively expensive. This paper introduces an LLM chain ensemble methodology that aligns multiple LLMs in a sequence, routing data subsets to subsequent models based on classification uncertainty. This approach leverages the strengths of individual LLMs within a broader system, allowing each model to handle data points where it exhibits the highest confidence, while forwarding more complex cases to potentially more robust models. Our results show that the chain ensemble method often exceeds the performance of the best individual model in the chain and achieves substantial cost savings, making LLM chain ensembles a practical and efficient solution for large-scale data annotation challenges.

LLM Chain Ensembles for Scalable and Accurate Data Annotation

TL;DR

An LLM chain ensemble methodology that aligns multiple LLMs in a sequence, routing data subsets to subsequent models based on classification uncertainty, making LLM chain ensembles a practical and efficient solution for large-scale data annotation challenges.

Abstract

The ability of large language models (LLMs) to perform zero-shot classification makes them viable solutions for data annotation in rapidly evolving domains where quality labeled data is often scarce and costly to obtain. However, the large-scale deployment of LLMs can be prohibitively expensive. This paper introduces an LLM chain ensemble methodology that aligns multiple LLMs in a sequence, routing data subsets to subsequent models based on classification uncertainty. This approach leverages the strengths of individual LLMs within a broader system, allowing each model to handle data points where it exhibits the highest confidence, while forwarding more complex cases to potentially more robust models. Our results show that the chain ensemble method often exceeds the performance of the best individual model in the chain and achieves substantial cost savings, making LLM chain ensembles a practical and efficient solution for large-scale data annotation challenges.

Paper Structure

This paper contains 21 sections, 3 equations, 3 figures, 5 tables.

Figures (3)

  • Figure 1: This is a system diagram of our chaining methodology. The system depicts routing paths for subsets of data to pass to subsequent LLMs informed by a calculated confidence metric. After all data has been labeled by at least one LLM, we assign final labels informed by our rank-based ensemble.
  • Figure 2: Confidence Score distribution at chain link 1 and chain link 2 for stance detection task stratified by correctly and incorrectly labelled examples. All data greater than the given threshold (67th percentile dashed line at link 1 and 50th percentile dashed line at link 2) is data retained at the current chain link for classification in the rank-based ensemble. Data to the left is forwarded to the next chain for future additional classification. The distributions shown are overlaid histograms to show the calculated confidence score when stratified by assigned labels that are true in blue and assigned labels that are false in red.
  • Figure 3: Depicts the average F1 performance across chaining technique at chain links 1 through 4 for stance, ideology, and misinformation tasks. The figure denotes a clear increase in F1 performance over random data forwarding across LLMs when using our confidence forwarding metric and additional performance increase when incorporating rank-based ensembling.