Lowering the Barrier of Machine Learning: Achieving Zero Manual Labeling in Review Classification Using LLMs
Yejian Zhang, Shingo Takada
TL;DR
The paper tackles the barrier to adopting sentiment classification for online reviews by integrating ESCS-GPT, a label-generating LLM, with URSLMs tailored through domain-specific MLM pretraining and a set of robust classifiers. The proposed zero-label pipeline outputs high-accuracy sentiment predictions across multiple domains while dramatically reducing labeling effort, domain expertise, and computational requirements. Empirical results on three datasets (Movie, TripAdvisor, Amazon) show strong performance with up to 88.6% accuracy, and the approach demonstrates efficiency on accessible hardware like Colab. This work significantly enhances the practicality and accessibility of advanced sentiment analysis for small businesses and individual users, enabling broader deployment of ML-driven insights.
Abstract
With the internet's evolution, consumers increasingly rely on online reviews for service or product choices, necessitating that businesses analyze extensive customer feedback to enhance their offerings. While machine learning-based sentiment classification shows promise in this realm, its technical complexity often bars small businesses and individuals from leveraging such advancements, which may end up making the competitive gap between small and large businesses even bigger in terms of improving customer satisfaction. This paper introduces an approach that integrates large language models (LLMs), specifically Generative Pre-trained Transformer (GPT) and Bidirectional Encoder Representations from Transformers (BERT)-based models, making it accessible to a wider audience. Our experiments across various datasets confirm that our approach retains high classification accuracy without the need for manual labeling, expert knowledge in tuning and data annotation, or substantial computational power. By significantly lowering the barriers to applying sentiment classification techniques, our methodology enhances competitiveness and paves the way for making machine learning technology accessible to a broader audience.
