Table of Contents
Fetching ...

PromptMobile: Efficient Promptus for Low Bandwidth Mobile Video Streaming

Liming Liu, Jiangkai Wu, Haoyang Wang, Peiheng Wang, Zongming Guo, Xinggong Zhang

TL;DR

The paper tackles the challenge of real-time, low-bandwidth mobile video streaming using diffusion-based Promptus, which traditionally requires desktop-grade compute. It introduces PromptMobile, an on-device acceleration framework that combines a two-stage generation pathway, fine-grained inter-frame caching, and system-level optimizations to reach ambitious mobile performance. The key contributions include a $8.1\times$ reduction in generation cost, a $16.6\%$ reduction from inter-frame caching, and a $13.6\times$ speedup over the original Promptus, while delivering an average LPIPS improvement of $0.016$ at 280 kbps and reducing 60% of severely distorted frames compared to VQGAN. The approach demonstrates practical impact for bandwidth-constrained, mobile video streaming and showcases the viability of optimized on-device diffusion pipelines with hardware-aware acceleration.

Abstract

Traditional video compression algorithms exhibit significant quality degradation at extremely low bitrates. Promptus emerges as a new paradigm for video streaming, substantially cutting down the bandwidth essential for video streaming. However, Promptus is computationally intensive and can not run in real-time on mobile devices. This paper presents PromptMobile, an efficient acceleration framework tailored for on-device Promptus. Specifically, we propose (1) a two-stage efficient generation framework to reduce computational cost by 8.1x, (2) a fine-grained inter-frame caching to reduce redundant computations by 16.6%, (3) system-level optimizations to further enhance efficiency. The evaluations demonstrate that compared with the original Promptus, PromptMobile achieves a 13.6x increase in image generation speed. Compared with other streaming methods, PromptMobile achives an average LPIPS improvement of 0.016 (compared with H.265), reducing 60% of severely distorted frames (compared to VQGAN).

PromptMobile: Efficient Promptus for Low Bandwidth Mobile Video Streaming

TL;DR

The paper tackles the challenge of real-time, low-bandwidth mobile video streaming using diffusion-based Promptus, which traditionally requires desktop-grade compute. It introduces PromptMobile, an on-device acceleration framework that combines a two-stage generation pathway, fine-grained inter-frame caching, and system-level optimizations to reach ambitious mobile performance. The key contributions include a reduction in generation cost, a reduction from inter-frame caching, and a speedup over the original Promptus, while delivering an average LPIPS improvement of at 280 kbps and reducing 60% of severely distorted frames compared to VQGAN. The approach demonstrates practical impact for bandwidth-constrained, mobile video streaming and showcases the viability of optimized on-device diffusion pipelines with hardware-aware acceleration.

Abstract

Traditional video compression algorithms exhibit significant quality degradation at extremely low bitrates. Promptus emerges as a new paradigm for video streaming, substantially cutting down the bandwidth essential for video streaming. However, Promptus is computationally intensive and can not run in real-time on mobile devices. This paper presents PromptMobile, an efficient acceleration framework tailored for on-device Promptus. Specifically, we propose (1) a two-stage efficient generation framework to reduce computational cost by 8.1x, (2) a fine-grained inter-frame caching to reduce redundant computations by 16.6%, (3) system-level optimizations to further enhance efficiency. The evaluations demonstrate that compared with the original Promptus, PromptMobile achieves a 13.6x increase in image generation speed. Compared with other streaming methods, PromptMobile achives an average LPIPS improvement of 0.016 (compared with H.265), reducing 60% of severely distorted frames (compared to VQGAN).

Paper Structure

This paper contains 18 sections, 10 figures, 1 table.

Figures (10)

  • Figure 1: System overview of PromptMobile. Prompts are first transmitted from the server to the mobile client. The client uses a UNet network, optimized for the Apple Neural Engine (ANE), to perform single-step denoising and generate a low-resolution latent representation. This UNet also integrates an inter-frame caching strategy to reduce redundant computation. A TinyDecoder then reconstructs a stitched low-resolution image from the latent representation, which is subsequently unstitched and upsampled by a two-stage generation module to produce high-resolution frames. An optional residual stream can be applied to further enhance visual quality when needed.
  • Figure 2: Lower resolutions lead to faster inference; 128×128 runs about 9 times faster than 512×512.
  • Figure 3: Prompt interpolation results in highly similar intermediate features across frames.
  • Figure 4: The generated images becomes abstract when the resolution becomes lower.
  • Figure 5: The quality of the fitted image decreases as the resolution decreases.
  • ...and 5 more figures