LLM-based Weak Supervision Framework for Query Intent Classification in Video Search

Farnoosh Javadi; Phanideep Gampa; Alyssa Woo; Xingxing Geng; Hang Zhang; Jose Sepulveda; Belhassen Bayar; Fei Wang

LLM-based Weak Supervision Framework for Query Intent Classification in Video Search

Farnoosh Javadi, Phanideep Gampa, Alyssa Woo, Xingxing Geng, Hang Zhang, Jose Sepulveda, Belhassen Bayar, Fei Wang

TL;DR

This paper tackles the challenge of robust query intent understanding in video search by replacing costly manual labeling with a weak supervision framework that leverages large language models. A network of LLM personas, combined with chain-of-thought reasoning and in-context learning, generates high-quality labeled data, which trains a lightweight, low-latency BERT-based multi-label classifier for real-time inference. A persona routing mechanism selects the most informative personas for each query, further boosting annotation quality. Empirical results show a 113% relative recall improvement over traditional NLU systems, 47.60% improvement in weighted F1 agreement with human annotations, and an additional 3.67% gain from the routing strategy, demonstrating practical scalability and impact for video search systems.

Abstract

Streaming services have reshaped how we discover and engage with digital entertainment. Despite these advancements, effectively understanding the wide spectrum of user search queries continues to pose a significant challenge. An accurate query understanding system that can handle a variety of entities that represent different user intents is essential for delivering an enhanced user experience. We can build such a system by training a natural language understanding (NLU) model; however, obtaining high-quality labeled training data in this specialized domain is a substantial obstacle. Manual annotation is costly and impractical for capturing users' vast vocabulary variations. To address this, we introduce a novel approach that leverages large language models (LLMs) through weak supervision to automatically annotate a vast collection of user search queries. Using prompt engineering and a diverse set of LLM personas, we generate training data that matches human annotator expectations. By incorporating domain knowledge via Chain of Thought and In-Context Learning, our approach leverages the labeled data to train low-latency models optimized for real-time inference. Extensive evaluations demonstrated that our approach outperformed the baseline with an average relative gain of 113% in recall. Furthermore, our novel prompt engineering framework yields higher quality LLM-generated data to be used for weak supervision; we observed 47.60% improvement over baseline in agreement rate between LLM predictions and human annotations with respect to F1 score, weighted according to the distribution of occurrences of the search queries. Our persona selection routing mechanism further adds an additional 3.67% increase in weighted F1 score on top of our novel prompt engineering framework.

LLM-based Weak Supervision Framework for Query Intent Classification in Video Search

TL;DR

Abstract

LLM-based Weak Supervision Framework for Query Intent Classification in Video Search

Authors

TL;DR

Abstract

Table of Contents

Figures (5)