Smaller is Better: Generative Models Can Power Short Video Preloading

Liming Liu; Jiangkai Wu; Xinggong Zhang

Smaller is Better: Generative Models Can Power Short Video Preloading

Liming Liu, Jiangkai Wu, Xinggong Zhang

TL;DR

This work tackles the stall–waste dilemma in short video preloading by introducing PromptPream, which shifts bandwidth from pixel data to compact semantic prompts decoded by diffusion models. It combines a gradient-based prompt inversion to generate token embeddings, a computation-aware scheduler that integrates decoding latency with download decisions, and a tree-based search (MCTS with pruning) to navigate a large codec and order space. The approach enables out-of-order downloads and parallel decode across CPU, GPU, and NPU, achieving reductions of over $31\%$ in stalls and data waste and a QoE increase of $45\%$ compared with traditional preloading strategies. The results demonstrate practical gains in transmission efficiency and user experience, highlighting the potential of computation-assisted video delivery on commodity devices.

Abstract

Preloading is widely used in short video platforms to minimize playback stalls by downloading future content in advance. However, existing strategies face a tradeoff. Aggressive preloading reduces stalls but wastes bandwidth, while conservative strategies save data but increase the risk of playback stalls. This paper presents PromptPream, a computation powered preloading paradigm that breaks this tradeoff by using local computation to reduce bandwidth demand. Instead of transmitting pixel level video chunks, PromptPream sends compact semantic prompts that are decoded into high quality frames using generative models such as Stable Diffusion. We propose three core techniques to enable this paradigm: (1) a gradient based prompt inversion method that compresses frames into small sets of compact token embeddings; (2) a computation aware scheduling strategy that jointly optimizes network and compute resource usage; and (3) a scalable searching algorithm that addresses the enlarged scheduling space introduced by scheduler. Evaluations show that PromptStream reduces both stalls and bandwidth waste by over 31%, and improves Quality of Experience (QoE) by 45%, compared to traditional strategies.

Smaller is Better: Generative Models Can Power Short Video Preloading

TL;DR

in stalls and data waste and a QoE increase of

compared with traditional preloading strategies. The results demonstrate practical gains in transmission efficiency and user experience, highlighting the potential of computation-assisted video delivery on commodity devices.

Abstract

Paper Structure (25 sections, 8 equations, 7 figures, 1 table)

This paper contains 25 sections, 8 equations, 7 figures, 1 table.

Introduction
Motivation and Challenges
Stalls and Waste Are Still Inevitable
Can We Reduce Both Stalls and Waste?
Use Computation to Break the Tradeoff
Challenges
Challenge 1: Constructing Compact and Effective Semantic Representations.
Challenge 2: Scheduling Under Computation and Bandwidth Constraints.
Challenge 3: Searching Over a Large and Flexible Decision Space.
System Overview
Encoding Pipeline
Client Scheduler and Decoder
Prompt Inversion
Gradient based Prompt Inversion
What to Invert? Sentence Or Token?
...and 10 more sections

Figures (7)

Figure 1: Impact of data budget on stall and bandwidth waste.
Figure 2: System Overview. Our system extends traditional short video preloading by introducing prompt-based encoding and computation-aware scheduling. The system includes three roles: an encoder that generates both conventional (H.265) and prompt-based representations; a server that stores multiple options per chunk; and a viewer that uses a computation-aware scheduler to decide which chunks to download and decode. The viewer exploits parallelism across CPU, GPU, NPU, and Video Decoder(VD), with a Decoder Dispatcher routing each chunk to the appropriate backend.
Figure 3: Overview of the encoding pipeline.
Figure 4: Reconstructions from low rank approximations of sentence embeddings fail to preserve fine details, indicating that the low rank assumption does not hold.
Figure 5: Comparison of sequential and out of order chunk loading strategies. Out of order scheduling enables earlier and more efficient prompt chunk decoding, allowing more content to be served in the prompt format, reducing bandwidth usage and improving compute utilization.
...and 2 more figures

Smaller is Better: Generative Models Can Power Short Video Preloading

TL;DR

Abstract

Smaller is Better: Generative Models Can Power Short Video Preloading

Authors

TL;DR

Abstract

Table of Contents

Figures (7)