Table of Contents
Fetching ...

Enhancing LLM-Based Text Classification in Political Science: Automatic Prompt Optimization and Dynamic Exemplar Selection for Few-Shot Learning

Menglin Liu, Ge Shi

TL;DR

This paper introduces PoliPrompt, a three-stage LLM-based framework for political text classification that eliminates heavy retraining by combining automatic prompt optimization, dynamic exemplar selection, and a consensus mechanism. It preprocesses data with embeddings and UMAP, uses a small labeled exemplar pool to generate task descriptions, and dynamically retrieves relevant exemplars via Maximal Marginal Relevance for each query. A chain-of-thought based judge arbiters disagreements among LLMs to boost reliability and interpretability, demonstrated across sentiment, stance, and campaign ad tone tasks. The results show meaningful accuracy gains, reveal the value of prompt transparency, and position PoliPrompt as a scalable, open-source tool for political science text analysis, with potential extensions to ordinal scaling and multimodal data.

Abstract

Large language models (LLMs) offer substantial promise for text classification in political science, yet their effectiveness often depends on high-quality prompts and exemplars. To address this, we introduce a three-stage framework that enhances LLM performance through automatic prompt optimization, dynamic exemplar selection, and a consensus mechanism. Our approach automates prompt refinement using task-specific exemplars, eliminating speculative trial-and-error adjustments and producing structured prompts aligned with human-defined criteria. In the second stage, we dynamically select the most relevant exemplars, ensuring contextually appropriate guidance for each query. Finally, our consensus mechanism mimics the role of multiple human coders for a single task, combining outputs from LLMs to achieve high reliability and consistency at a reduced cost. Evaluated across tasks including sentiment analysis, stance detection, and campaign ad tone classification, our method enhances classification accuracy without requiring task-specific model retraining or extensive manual adjustments to prompts. This framework not only boosts accuracy, interpretability and transparency but also provides a cost-effective, scalable solution tailored to political science applications. An open-source Python package (PoliPrompt) is available on GitHub.

Enhancing LLM-Based Text Classification in Political Science: Automatic Prompt Optimization and Dynamic Exemplar Selection for Few-Shot Learning

TL;DR

This paper introduces PoliPrompt, a three-stage LLM-based framework for political text classification that eliminates heavy retraining by combining automatic prompt optimization, dynamic exemplar selection, and a consensus mechanism. It preprocesses data with embeddings and UMAP, uses a small labeled exemplar pool to generate task descriptions, and dynamically retrieves relevant exemplars via Maximal Marginal Relevance for each query. A chain-of-thought based judge arbiters disagreements among LLMs to boost reliability and interpretability, demonstrated across sentiment, stance, and campaign ad tone tasks. The results show meaningful accuracy gains, reveal the value of prompt transparency, and position PoliPrompt as a scalable, open-source tool for political science text analysis, with potential extensions to ordinal scaling and multimodal data.

Abstract

Large language models (LLMs) offer substantial promise for text classification in political science, yet their effectiveness often depends on high-quality prompts and exemplars. To address this, we introduce a three-stage framework that enhances LLM performance through automatic prompt optimization, dynamic exemplar selection, and a consensus mechanism. Our approach automates prompt refinement using task-specific exemplars, eliminating speculative trial-and-error adjustments and producing structured prompts aligned with human-defined criteria. In the second stage, we dynamically select the most relevant exemplars, ensuring contextually appropriate guidance for each query. Finally, our consensus mechanism mimics the role of multiple human coders for a single task, combining outputs from LLMs to achieve high reliability and consistency at a reduced cost. Evaluated across tasks including sentiment analysis, stance detection, and campaign ad tone classification, our method enhances classification accuracy without requiring task-specific model retraining or extensive manual adjustments to prompts. This framework not only boosts accuracy, interpretability and transparency but also provides a cost-effective, scalable solution tailored to political science applications. An open-source Python package (PoliPrompt) is available on GitHub.
Paper Structure (43 sections, 9 equations, 15 figures, 25 tables)

This paper contains 43 sections, 9 equations, 15 figures, 25 tables.

Figures (15)

  • Figure 1: Exemplar Selection Process for Few-Shot Learning: Demonstrating k-means Clustering to Identify Representative Samples for Labeling
  • Figure 2: Overview of the three core stages of our PoliPrompt framework
  • Figure 3: Comparison of Simple Heuristic and Enhanced Prompts for Sentiment and Stance Classification.
  • Figure 4: Prompts for Sentiment Analysis about Brett Kavanaugh
  • Figure 5: Measuring Sentiment toward Kavanaugh: Comparison of F1 Scores across Different Methods
  • ...and 10 more figures