Table of Contents
Fetching ...

Smaller is Better: Generative Models Can Power Short Video Preloading

Liming Liu, Jiangkai Wu, Xinggong Zhang

TL;DR

This work tackles the stall–waste dilemma in short video preloading by introducing PromptPream, which shifts bandwidth from pixel data to compact semantic prompts decoded by diffusion models. It combines a gradient-based prompt inversion to generate token embeddings, a computation-aware scheduler that integrates decoding latency with download decisions, and a tree-based search (MCTS with pruning) to navigate a large codec and order space. The approach enables out-of-order downloads and parallel decode across CPU, GPU, and NPU, achieving reductions of over $31\%$ in stalls and data waste and a QoE increase of $45\%$ compared with traditional preloading strategies. The results demonstrate practical gains in transmission efficiency and user experience, highlighting the potential of computation-assisted video delivery on commodity devices.

Abstract

Preloading is widely used in short video platforms to minimize playback stalls by downloading future content in advance. However, existing strategies face a tradeoff. Aggressive preloading reduces stalls but wastes bandwidth, while conservative strategies save data but increase the risk of playback stalls. This paper presents PromptPream, a computation powered preloading paradigm that breaks this tradeoff by using local computation to reduce bandwidth demand. Instead of transmitting pixel level video chunks, PromptPream sends compact semantic prompts that are decoded into high quality frames using generative models such as Stable Diffusion. We propose three core techniques to enable this paradigm: (1) a gradient based prompt inversion method that compresses frames into small sets of compact token embeddings; (2) a computation aware scheduling strategy that jointly optimizes network and compute resource usage; and (3) a scalable searching algorithm that addresses the enlarged scheduling space introduced by scheduler. Evaluations show that PromptStream reduces both stalls and bandwidth waste by over 31%, and improves Quality of Experience (QoE) by 45%, compared to traditional strategies.

Smaller is Better: Generative Models Can Power Short Video Preloading

TL;DR

This work tackles the stall–waste dilemma in short video preloading by introducing PromptPream, which shifts bandwidth from pixel data to compact semantic prompts decoded by diffusion models. It combines a gradient-based prompt inversion to generate token embeddings, a computation-aware scheduler that integrates decoding latency with download decisions, and a tree-based search (MCTS with pruning) to navigate a large codec and order space. The approach enables out-of-order downloads and parallel decode across CPU, GPU, and NPU, achieving reductions of over in stalls and data waste and a QoE increase of compared with traditional preloading strategies. The results demonstrate practical gains in transmission efficiency and user experience, highlighting the potential of computation-assisted video delivery on commodity devices.

Abstract

Preloading is widely used in short video platforms to minimize playback stalls by downloading future content in advance. However, existing strategies face a tradeoff. Aggressive preloading reduces stalls but wastes bandwidth, while conservative strategies save data but increase the risk of playback stalls. This paper presents PromptPream, a computation powered preloading paradigm that breaks this tradeoff by using local computation to reduce bandwidth demand. Instead of transmitting pixel level video chunks, PromptPream sends compact semantic prompts that are decoded into high quality frames using generative models such as Stable Diffusion. We propose three core techniques to enable this paradigm: (1) a gradient based prompt inversion method that compresses frames into small sets of compact token embeddings; (2) a computation aware scheduling strategy that jointly optimizes network and compute resource usage; and (3) a scalable searching algorithm that addresses the enlarged scheduling space introduced by scheduler. Evaluations show that PromptStream reduces both stalls and bandwidth waste by over 31%, and improves Quality of Experience (QoE) by 45%, compared to traditional strategies.
Paper Structure (25 sections, 8 equations, 7 figures, 1 table)

This paper contains 25 sections, 8 equations, 7 figures, 1 table.

Figures (7)

  • Figure 1: Impact of data budget on stall and bandwidth waste.
  • Figure 2: System Overview. Our system extends traditional short video preloading by introducing prompt-based encoding and computation-aware scheduling. The system includes three roles: an encoder that generates both conventional (H.265) and prompt-based representations; a server that stores multiple options per chunk; and a viewer that uses a computation-aware scheduler to decide which chunks to download and decode. The viewer exploits parallelism across CPU, GPU, NPU, and Video Decoder(VD), with a Decoder Dispatcher routing each chunk to the appropriate backend.
  • Figure 3: Overview of the encoding pipeline.
  • Figure 4: Reconstructions from low rank approximations of sentence embeddings fail to preserve fine details, indicating that the low rank assumption does not hold.
  • Figure 5: Comparison of sequential and out of order chunk loading strategies. Out of order scheduling enables earlier and more efficient prompt chunk decoding, allowing more content to be served in the prompt format, reducing bandwidth usage and improving compute utilization.
  • ...and 2 more figures