ELECTRA and GPT-4o: Cost-Effective Partners for Sentiment Analysis
James P. Beno
TL;DR
This work investigates a collaborative sentiment analysis approach by feeding predictions from fine-tuned ELECTRA encoders into GPT-4o family prompts for three-way sentiment classification. Using a merged SST-3 and DynaSent dataset, the study systematically evaluates baselines, fine-tuning, and DSPy-driven prompt augmentations, revealing that non-fine-tuned GPTs benefit significantly from ELECTRA-derived context while fine-tuned GPTs may negate these gains at higher costs. GPT-4o-M delivers the top performance, with GPT-4o-mini FT offering near-parity at a fraction of the cost; a cost-efficient alternative is ELECTRA Base FT combined with GPT-4o-mini. The results provide practical guidelines for resource-constrained sentiment analysis projects and demonstrate that encoder–LLM collaboration can substantially boost performance with favorable cost profiles. The work suggests future extensions to other domains, datasets, and open-source GPT families, as well as deeper exploration of prompt design and explainability in collaborative setups.
Abstract
Bidirectional transformers excel at sentiment analysis, and Large Language Models (LLM) are effective zero-shot learners. Might they perform better as a team? This paper explores collaborative approaches between ELECTRA and GPT-4o for three-way sentiment classification. We fine-tuned (FT) four models (ELECTRA Base/Large, GPT-4o/4o-mini) using a mix of reviews from Stanford Sentiment Treebank (SST) and DynaSent. We provided input from ELECTRA to GPT as: predicted label, probabilities, and retrieved examples. Sharing ELECTRA Base FT predictions with GPT-4o-mini significantly improved performance over either model alone (82.50 macro F1 vs. 79.14 ELECTRA Base FT, 79.41 GPT-4o-mini) and yielded the lowest cost/performance ratio (\$0.12/F1 point). However, when GPT models were fine-tuned, including predictions decreased performance. GPT-4o FT-M was the top performer (86.99), with GPT-4o-mini FT close behind (86.70) at much less cost (\$0.38 vs. \$1.59/F1 point). Our results show that augmenting prompts with predictions from fine-tuned encoders is an efficient way to boost performance, and a fine-tuned GPT-4o-mini is nearly as good as GPT-4o FT at 76% less cost. Both are affordable options for projects with limited resources.
