Table of Contents
Fetching ...

RED-CT: A Systems Design Methodology for Using LLM-labeled Data to Train and Deploy Edge Classifiers for Computational Social Science

David Farr, Nico Manzonelli, Iain Cruickshank, Jevin West

TL;DR

This study adopts a systems design approach to employing LLMs as imperfect data annotators for downstream supervised learning tasks, introducing novel system intervention measures aimed at improving classification performance.

Abstract

Large language models (LLMs) have enhanced our ability to rapidly analyze and classify unstructured natural language data. However, concerns regarding cost, network limitations, and security constraints have posed challenges for their integration into work processes. In this study, we adopt a systems design approach to employing LLMs as imperfect data annotators for downstream supervised learning tasks, introducing novel system intervention measures aimed at improving classification performance. Our methodology outperforms LLM-generated labels in seven of eight tests, demonstrating an effective strategy for incorporating LLMs into the design and deployment of specialized, supervised learning models present in many industry use cases.

RED-CT: A Systems Design Methodology for Using LLM-labeled Data to Train and Deploy Edge Classifiers for Computational Social Science

TL;DR

This study adopts a systems design approach to employing LLMs as imperfect data annotators for downstream supervised learning tasks, introducing novel system intervention measures aimed at improving classification performance.

Abstract

Large language models (LLMs) have enhanced our ability to rapidly analyze and classify unstructured natural language data. However, concerns regarding cost, network limitations, and security constraints have posed challenges for their integration into work processes. In this study, we adopt a systems design approach to employing LLMs as imperfect data annotators for downstream supervised learning tasks, introducing novel system intervention measures aimed at improving classification performance. Our methodology outperforms LLM-generated labels in seven of eight tests, demonstrating an effective strategy for incorporating LLMs into the design and deployment of specialized, supervised learning models present in many industry use cases.
Paper Structure (23 sections, 1 equation, 4 figures, 2 tables)

This paper contains 23 sections, 1 equation, 4 figures, 2 tables.

Figures (4)

  • Figure 1: RED-CT design which allows LLM-like capabilities for NLP tasks deployed in edge environments.
  • Figure 2: The distribution of confidence scores for examples labeled correctly and incorrectly using gpt-3.5-turbo zero-shot stance classification. The distributions are overlaid as opposed to stacked.
  • Figure 3: Comparing edge model F1 score as we change model and system interventions types for stance detection. We note steady improvements of edge model performance as we introduce more complex models and system intervention measures. The largest edge model with all system interventions out-performs gpt-3.5-turbo CoT.
  • Figure 4: Varying the number of expert labels included amongst the LLM labels in the training process for DistilBERT and RoBERTa-L. RS implies randomly sampled expert labels for the training process and CI SL implies confidence informed sampling with label weighted training. Blue corresponds to the Mistral-7B-Instruct-2.0 LLM labeler and green corresponds to the GPT-3.5 LLM labeler. The horizontal dashed lines represent the zero-shot accuracy of each LLM.