Table of Contents
Fetching ...

LLMPopcorn: An Empirical Study of LLMs as Assistants for Popular Micro-video Generation

Junchen Fu, Xuri Ge, Kaiwen Zheng, Ioannis Arapakis, Xin Xin, Joemon M. Jose

TL;DR

This work introduces LLMPopcorn, an end-to-end pipeline that couples large language models with diffusion-based video generators and an offline popularity predictor to create popular micro-videos. It formalizes the problem, details a pipeline and a prompt-enhancement strategy based on retrieval-augmented generation and chain-of-thought prompting, and benchmarks multiple LLMs and video generators. The empirical study shows that DeepSeek-V3 and DeepSeek-R1 are the strongest LLMs, and that prompt enhancement consistently boosts performance across configurations, with LTX-Video and HunyuanVideo delivering robust video generation. While results approach human baselines in some settings, absolute engagement scores remain modest, highlighting both potential and the need for further alignment and real-user validation. The work lays a foundation for popularity-driven AI-assisted micro-video creation and offers publicly shareable datasets and code for future research.

Abstract

Popular Micro-videos, dominant on platforms like TikTok and YouTube, hold significant commercial value. The rise of high-quality AI-generated content has spurred interest in AI-driven micro-video creation. However, despite the advanced capabilities of large language models (LLMs) like ChatGPT and DeepSeek in text generation and reasoning, their potential to assist the creation of popular micro-videos remains largely unexplored. In this paper, we conduct an empirical study on LLM-assisted popular micro-video generation (LLMPopcorn). Specifically, we investigate the following research questions: (i) How can LLMs be effectively utilized to assist popular micro-video generation? (ii) To what extent can prompt-based enhancements optimize the LLM-generated content for higher popularity? (iii) How well do various LLMs and video generators perform in the popular micro-video generation task? By exploring these questions, we show that advanced LLMs like DeepSeek-V3 enable micro-video generation to achieve popularity comparable to human-created content. Prompt enhancements further boost popularity, and benchmarking highlights DeepSeek-V3 and DeepSeek-R1 among LLMs, while LTX-Video and HunyuanVideo lead in video generation. This pioneering work advances AI-assisted micro-video creation, uncovering new research opportunities. We will release the code and datasets to support future studies.

LLMPopcorn: An Empirical Study of LLMs as Assistants for Popular Micro-video Generation

TL;DR

This work introduces LLMPopcorn, an end-to-end pipeline that couples large language models with diffusion-based video generators and an offline popularity predictor to create popular micro-videos. It formalizes the problem, details a pipeline and a prompt-enhancement strategy based on retrieval-augmented generation and chain-of-thought prompting, and benchmarks multiple LLMs and video generators. The empirical study shows that DeepSeek-V3 and DeepSeek-R1 are the strongest LLMs, and that prompt enhancement consistently boosts performance across configurations, with LTX-Video and HunyuanVideo delivering robust video generation. While results approach human baselines in some settings, absolute engagement scores remain modest, highlighting both potential and the need for further alignment and real-user validation. The work lays a foundation for popularity-driven AI-assisted micro-video creation and offers publicly shareable datasets and code for future research.

Abstract

Popular Micro-videos, dominant on platforms like TikTok and YouTube, hold significant commercial value. The rise of high-quality AI-generated content has spurred interest in AI-driven micro-video creation. However, despite the advanced capabilities of large language models (LLMs) like ChatGPT and DeepSeek in text generation and reasoning, their potential to assist the creation of popular micro-videos remains largely unexplored. In this paper, we conduct an empirical study on LLM-assisted popular micro-video generation (LLMPopcorn). Specifically, we investigate the following research questions: (i) How can LLMs be effectively utilized to assist popular micro-video generation? (ii) To what extent can prompt-based enhancements optimize the LLM-generated content for higher popularity? (iii) How well do various LLMs and video generators perform in the popular micro-video generation task? By exploring these questions, we show that advanced LLMs like DeepSeek-V3 enable micro-video generation to achieve popularity comparable to human-created content. Prompt enhancements further boost popularity, and benchmarking highlights DeepSeek-V3 and DeepSeek-R1 among LLMs, while LTX-Video and HunyuanVideo lead in video generation. This pioneering work advances AI-assisted micro-video creation, uncovering new research opportunities. We will release the code and datasets to support future studies.

Paper Structure

This paper contains 21 sections, 5 equations, 10 figures, 10 tables.

Figures (10)

  • Figure 1: An overview of the LLMPopcorn pipeline. The numbered steps indicate the sequential implementation of the pipeline. Given a user query as input, the pipeline generates video titles and micro-videos, followed by an evaluation process.
  • Figure 2: An overview of the Prompt Enhancement (PE) process. PE enables the LLM reviewing relevant micro-videos from the database and engaging in chain-of-thought reasoning.
  • Figure 3: Predicted popularity distributions for different LLMs. Evaluated using three LLMs: Llama-3.3-70B, DeepSeek-V3, and ChatGPT-4o. Concrete and Abstract represent the types of user prompt datasets.
  • Figure 4: Video examples from LLMPopcorn for a concrete user query.
  • Figure 5: Video examples from LLMPopcorn for an abstract user query.
  • ...and 5 more figures