Table of Contents
Fetching ...

LLM-Powered Nuanced Video Attribute Annotation for Enhanced Recommendations

Boyuan Long, Yueqi Wang, Hiloni Mehta, Mick Zomnir, Omkar Pathak, Changping Meng, Ruolin Jia, Yajun Peng, Dapeng Hong, Xia Wu, Mingyan Gao, Onkar Dalal, Ningren Han

TL;DR

The paper addresses the need for nuanced content understanding in large-scale video recommendations, where traditional classifiers fall short for subtle attributes. It introduces an LLM-as-annotators pipeline with three stages: define attributes with a Golden Set and iterative evaluation; offline bulk annotation with optimized LLM inference and knowledge distillation to scale to $O(10^7)$ annotations/day; and online integration via Personalized Restricted Retrieval. Empirical results show Gemini $2.5$ Pro achieving $F1=81.33\%$ ($Precision=85.03\%$, $Recall=77.94\%$) versus human $F1=63.21\%$, and online lifts of $+0.49\%$ in user participation and $+0.21\%$ in satisfied consumption, demonstrating production-ready improvements in both annotation quality and user experience. The study demonstrates a practical, scalable path for deploying LLM-generated nuanced annotations to improve content discovery and recommender effectiveness, with tight offline-online integration, inference optimization, and knowledge distillation enabling rapid iteration at industrial scale ($O(10^7)$ annotations/day).

Abstract

This paper presents a case study on deploying Large Language Models (LLMs) as an advanced "annotation" mechanism to achieve nuanced content understanding (e.g., discerning content "vibe") at scale within a large-scale industrial short-form video recommendation system. Traditional machine learning classifiers for content understanding face protracted development cycles and a lack of deep, nuanced comprehension. The "LLM-as-annotators" approach addresses these by significantly shortening development times and enabling the annotation of subtle attributes. This work details an end-to-end workflow encompassing: (1) iterative definition and robust evaluation of target attributes, refined by offline metrics and online A/B testing; (2) scalable offline bulk annotation of video corpora using LLMs with multimodal features, optimized inference, and knowledge distillation for broad application; and (3) integration of these rich annotations into the online recommendation serving system, for example, through personalized restrict retrieval. Experimental results demonstrate the efficacy of this approach, with LLMs outperforming human raters in offline annotation quality for nuanced attributes and yielding significant improvements of user participation and satisfied consumption in online A/B tests. The study provides insights into designing and scaling production-level LLM pipelines for rich content evaluation, highlighting the adaptability and benefits of LLM-generated nuanced understanding for enhancing content discovery, user satisfaction, and the overall effectiveness of modern recommendation systems.

LLM-Powered Nuanced Video Attribute Annotation for Enhanced Recommendations

TL;DR

The paper addresses the need for nuanced content understanding in large-scale video recommendations, where traditional classifiers fall short for subtle attributes. It introduces an LLM-as-annotators pipeline with three stages: define attributes with a Golden Set and iterative evaluation; offline bulk annotation with optimized LLM inference and knowledge distillation to scale to annotations/day; and online integration via Personalized Restricted Retrieval. Empirical results show Gemini Pro achieving (, ) versus human , and online lifts of in user participation and in satisfied consumption, demonstrating production-ready improvements in both annotation quality and user experience. The study demonstrates a practical, scalable path for deploying LLM-generated nuanced annotations to improve content discovery and recommender effectiveness, with tight offline-online integration, inference optimization, and knowledge distillation enabling rapid iteration at industrial scale ( annotations/day).

Abstract

This paper presents a case study on deploying Large Language Models (LLMs) as an advanced "annotation" mechanism to achieve nuanced content understanding (e.g., discerning content "vibe") at scale within a large-scale industrial short-form video recommendation system. Traditional machine learning classifiers for content understanding face protracted development cycles and a lack of deep, nuanced comprehension. The "LLM-as-annotators" approach addresses these by significantly shortening development times and enabling the annotation of subtle attributes. This work details an end-to-end workflow encompassing: (1) iterative definition and robust evaluation of target attributes, refined by offline metrics and online A/B testing; (2) scalable offline bulk annotation of video corpora using LLMs with multimodal features, optimized inference, and knowledge distillation for broad application; and (3) integration of these rich annotations into the online recommendation serving system, for example, through personalized restrict retrieval. Experimental results demonstrate the efficacy of this approach, with LLMs outperforming human raters in offline annotation quality for nuanced attributes and yielding significant improvements of user participation and satisfied consumption in online A/B tests. The study provides insights into designing and scaling production-level LLM pipelines for rich content evaluation, highlighting the adaptability and benefits of LLM-generated nuanced understanding for enhancing content discovery, user satisfaction, and the overall effectiveness of modern recommendation systems.

Paper Structure

This paper contains 10 sections, 1 figure.

Figures (1)

  • Figure 1: Workflow Diagram. The workflow consists of three stages. Offline Evaluation (yellow) assesses LLM quality using a "Golden Set" of examples validated by aligned expert raters, and this LLM is then used to create a larger "Silver Set" for distilling student models. Offline Inference (blue) processes video corpus and features to build the LLM-annotated corpus. Online Recommendation (green) then leverages these annotations to enhance recommendations through systems like personalized retrieval. Rounded rectangles are models/systems; trapezoids are datasets/processes.