UFO: Enhancing Diffusion-Based Video Generation with a Uniform Frame Organizer

Delong Liu; Zhaohui Hou; Mingjie Zhan; Shihao Han; Zhicheng Zhao; Fei Su

UFO: Enhancing Diffusion-Based Video Generation with a Uniform Frame Organizer

Delong Liu, Zhaohui Hou, Mingjie Zhan, Shihao Han, Zhicheng Zhao, Fei Su

TL;DR

Diffuse-video quality and consistency remain challenging as sequences grow longer. The Uniform Frame Organizer (UFO) introduces lightweight adapters that attach to diffusion backbones, enabling non-destructive consistency improvements with an adjustable intensity parameter and fast, resource-efficient training (≈3000 steps on a single GPU). By updating only the UFOs and allowing direct transfer across models of the same specification, UFO achieves improved temporal and frame-wise quality (as measured by Vbench) and supports stylization while preserving original outputs. The approach is modular, transferable, and practical for creating personalized, high-quality diffuse videos with minimal retraining, and it includes a discussion of limitations and future work on automatic intensity adjustment.

Abstract

Recently, diffusion-based video generation models have achieved significant success. However, existing models often suffer from issues like weak consistency and declining image quality over time. To overcome these challenges, inspired by aesthetic principles, we propose a non-invasive plug-in called Uniform Frame Organizer (UFO), which is compatible with any diffusion-based video generation model. The UFO comprises a series of adaptive adapters with adjustable intensities, which can significantly enhance the consistency between the foreground and background of videos and improve image quality without altering the original model parameters when integrated. The training for UFO is simple, efficient, requires minimal resources, and supports stylized training. Its modular design allows for the combination of multiple UFOs, enabling the customization of personalized video generation models. Furthermore, the UFO also supports direct transferability across different models of the same specification without the need for specific retraining. The experimental results indicate that UFO effectively enhances video generation quality and demonstrates its superiority in public video generation benchmarks. The code will be publicly available at https://github.com/Delong-liu-bupt/UFO.

UFO: Enhancing Diffusion-Based Video Generation with a Uniform Frame Organizer

TL;DR

Abstract

UFO: Enhancing Diffusion-Based Video Generation with a Uniform Frame Organizer

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (12)