Enhance-A-Video: Better Generated Video for Free

Yang Luo; Xuanlei Zhao; Mengzhao Chen; Kaipeng Zhang; Wenqi Shao; Kai Wang; Zhangyang Wang; Yang You

Enhance-A-Video: Better Generated Video for Free

Yang Luo, Xuanlei Zhao, Mengzhao Chen, Kaipeng Zhang, Wenqi Shao, Kai Wang, Zhangyang Wang, Yang You

TL;DR

Enhance-A-Video presents a training-free, plug-in method to boost temporal coherence and visual fidelity in diffusion-transformer–based video generation by leveraging cross-frame information in temporal attention. It introduces Cross-Frame Intensity (CFI) derived from non-diagonal attention and an enhanced temperature mechanism via a dedicated Enhance Block, integrated in a residual path to modestly amplify cross-frame signals while preserving intra-frame details. The approach is model-agnostic and demonstrated across both 3D full-attention and spatial-temporal DiT-based models (e.g., HunyuanVideo, CogVideoX, LTX-Video, Open-Sora), yielding improved temporal consistency and visual quality with minimal inference overhead. Quantitative user studies and VBench evaluations corroborate the qualitative gains, and ablations highlight moderate temperature values and clipping as key to stable, high-quality enhancements. The work opens avenues for adaptive temperature control and joint attention enhancements, suggesting practical impact for real-time video generation and editing workflows.

Abstract

DiT-based video generation has achieved remarkable results, but research into enhancing existing models remains relatively unexplored. In this work, we introduce a training-free approach to enhance the coherence and quality of DiT-based generated videos, named Enhance-A-Video. The core idea is enhancing the cross-frame correlations based on non-diagonal temporal attention distributions. Thanks to its simple design, our approach can be easily applied to most DiT-based video generation frameworks without any retraining or fine-tuning. Across various DiT-based video generation models, our approach demonstrates promising improvements in both temporal consistency and visual quality. We hope this research can inspire future explorations in video generation enhancement.

Enhance-A-Video: Better Generated Video for Free

TL;DR

Abstract

Enhance-A-Video: Better Generated Video for Free

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (19)