RED-CT: A Systems Design Methodology for Using LLM-labeled Data to Train and Deploy Edge Classifiers for Computational Social Science

David Farr; Nico Manzonelli; Iain Cruickshank; Jevin West

RED-CT: A Systems Design Methodology for Using LLM-labeled Data to Train and Deploy Edge Classifiers for Computational Social Science

David Farr, Nico Manzonelli, Iain Cruickshank, Jevin West

TL;DR

This study adopts a systems design approach to employing LLMs as imperfect data annotators for downstream supervised learning tasks, introducing novel system intervention measures aimed at improving classification performance.

Abstract

Large language models (LLMs) have enhanced our ability to rapidly analyze and classify unstructured natural language data. However, concerns regarding cost, network limitations, and security constraints have posed challenges for their integration into work processes. In this study, we adopt a systems design approach to employing LLMs as imperfect data annotators for downstream supervised learning tasks, introducing novel system intervention measures aimed at improving classification performance. Our methodology outperforms LLM-generated labels in seven of eight tests, demonstrating an effective strategy for incorporating LLMs into the design and deployment of specialized, supervised learning models present in many industry use cases.

RED-CT: A Systems Design Methodology for Using LLM-labeled Data to Train and Deploy Edge Classifiers for Computational Social Science

TL;DR

Abstract

Paper Structure (23 sections, 1 equation, 4 figures, 2 tables)

This paper contains 23 sections, 1 equation, 4 figures, 2 tables.

Introduction
Related Works
Methodology
RED-CT System Methodology
Training Edge Classifiers on LLM-labeled Data
Incorporating Confidence Informed Expert Labels
Learning on Soft Labels
System Implementation and Experiment Design
CSS Tasks and Data Selection
Stance Detection
Misinformation
Ideology
Humor
Results
Discussion
...and 8 more sections

Figures (4)

Figure 1: RED-CT design which allows LLM-like capabilities for NLP tasks deployed in edge environments.
Figure 2: The distribution of confidence scores for examples labeled correctly and incorrectly using gpt-3.5-turbo zero-shot stance classification. The distributions are overlaid as opposed to stacked.
Figure 3: Comparing edge model F1 score as we change model and system interventions types for stance detection. We note steady improvements of edge model performance as we introduce more complex models and system intervention measures. The largest edge model with all system interventions out-performs gpt-3.5-turbo CoT.
Figure 4: Varying the number of expert labels included amongst the LLM labels in the training process for DistilBERT and RoBERTa-L. RS implies randomly sampled expert labels for the training process and CI SL implies confidence informed sampling with label weighted training. Blue corresponds to the Mistral-7B-Instruct-2.0 LLM labeler and green corresponds to the GPT-3.5 LLM labeler. The horizontal dashed lines represent the zero-shot accuracy of each LLM.

RED-CT: A Systems Design Methodology for Using LLM-labeled Data to Train and Deploy Edge Classifiers for Computational Social Science

TL;DR

Abstract

RED-CT: A Systems Design Methodology for Using LLM-labeled Data to Train and Deploy Edge Classifiers for Computational Social Science

Authors

TL;DR

Abstract

Table of Contents

Figures (4)