Smaller is Better: Generative Models Can Power Short Video Preloading
Liming Liu, Jiangkai Wu, Xinggong Zhang
TL;DR
This work tackles the stall–waste dilemma in short video preloading by introducing PromptPream, which shifts bandwidth from pixel data to compact semantic prompts decoded by diffusion models. It combines a gradient-based prompt inversion to generate token embeddings, a computation-aware scheduler that integrates decoding latency with download decisions, and a tree-based search (MCTS with pruning) to navigate a large codec and order space. The approach enables out-of-order downloads and parallel decode across CPU, GPU, and NPU, achieving reductions of over $31\%$ in stalls and data waste and a QoE increase of $45\%$ compared with traditional preloading strategies. The results demonstrate practical gains in transmission efficiency and user experience, highlighting the potential of computation-assisted video delivery on commodity devices.
Abstract
Preloading is widely used in short video platforms to minimize playback stalls by downloading future content in advance. However, existing strategies face a tradeoff. Aggressive preloading reduces stalls but wastes bandwidth, while conservative strategies save data but increase the risk of playback stalls. This paper presents PromptPream, a computation powered preloading paradigm that breaks this tradeoff by using local computation to reduce bandwidth demand. Instead of transmitting pixel level video chunks, PromptPream sends compact semantic prompts that are decoded into high quality frames using generative models such as Stable Diffusion. We propose three core techniques to enable this paradigm: (1) a gradient based prompt inversion method that compresses frames into small sets of compact token embeddings; (2) a computation aware scheduling strategy that jointly optimizes network and compute resource usage; and (3) a scalable searching algorithm that addresses the enlarged scheduling space introduced by scheduler. Evaluations show that PromptStream reduces both stalls and bandwidth waste by over 31%, and improves Quality of Experience (QoE) by 45%, compared to traditional strategies.
