Mirage: Transmitting a Video as a Perceptual Illusion for 50,000X Speedup
Junjie Wu, Tianrui Li, Yi Zhang, Ziyuan Yang
TL;DR
This work rethinks video transmission by discarding pixel-level data in favor of compact semantic cues that drive receiver-side generative synthesis. Mirage splits video into temporal captions and spatial keyframes, transmits these via a semantic communication channel, and reconstructs video with a diffusion-based generator guided by personalized prompts and anchors. The approach achieves massive data and latency reductions (up to $5.18\times 10^4$ data-speedup in reported scenarios) while preserving semantic consistency, enabling privacy-preserving and customizable video delivery. By integrating sender/network/receiver personalization with end-to-end semantic representations and generation, Mirage offers a scalable path toward efficient, privacy-respecting video transmission in future networks.
Abstract
The existing communication framework mainly aims at accurate reconstruction of source signals to ensure reliable transmission. However, this signal-level fidelity-oriented design often incurs high communication overhead and system complexity, particularly in video communication scenarios where mainstream frameworks rely on transmitting visual data itself, resulting in significant bandwidth consumption. To address this issue, we propose a visual data-free communication framework, Mirage, for extremely efficient video transmission while preserving semantic information. Mirage decomposes video content into two complementary components: temporal sequence information capturing motion dynamics and spatial appearance representations describing overall visual structure. Temporal information is preserved through video captioning, while key frames are encoded into compact semantic representations for spatial appearance. These representations are transmitted to the receiver, where videos are synthesized using generative video models. Since no raw visual data is transmitted, Mirage is inherently privacy-preserving. Mirage also supports personalized adaptation across deployment scenarios. The sender, network, and receiver can independently impose constraints on semantic representation, transmission, and generation, enabling flexible trade-offs between efficiency, privacy, control, and perceptual quality. Experimental results in video transmission demonstrate that Mirage achieves up to a 50000X data-level compression speedup over raw video transmission, with gains expected to scale with larger video content sizes.
