EffectMaker: Unifying Reasoning and Generation for Customized Visual Effect Creation

Shiyuan Yang; Ruihuang Li; Jiale Tao; Shuai Shao; Qinglin Lu; Jing Liao

EffectMaker: Unifying Reasoning and Generation for Customized Visual Effect Creation

Shiyuan Yang, Ruihuang Li, Jiale Tao, Shuai Shao, Qinglin Lu, Jing Liao

TL;DR

EffectMaker is presented, a unified reasoning-generation framework that enables reference-based VFX customization and achieves superior visual quality and effect consistency over state-of-the-art baselines, offering a scalable and flexible paradigm for customized VFX generation.

Abstract

Visual effects (VFX) are essential for enhancing the expressiveness and creativity of video content, yet producing high-quality effects typically requires expert knowledge and costly production pipelines. Existing AIGC systems face significant challenges in VFX generation due to the scarcity of effect-specific data and the inherent difficulty of modeling supernatural or stylized effects. Moreover, these approaches often require per-effect fine-tuning, which severely limits their scalability and generalization to novel VFX. In this work, we present EffectMaker, a unified reasoning-generation framework that enables reference-based VFX customization. EffectMaker employs a multimodal large language model to interpret high-level effect semantics and reason about how they should adapt to a target subject, while a diffusion transformer leverages in-context learning to capture fine-grained visual cues from reference videos. These two components form a semantic-visual dual-path guidance mechanism that enables accurate, controllable, and effect-consistent synthesis without per-effect fine-tuning. Furthermore, we construct EffectData, the largest high-quality synthetic dataset containing 130k videos across 3k VFX categories, to improve generalization and scalability. Experiments show that EffectMaker achieves superior visual quality and effect consistency over state-of-the-art baselines, offering a scalable and flexible paradigm for customized VFX generation. Project page: https://effectmaker.github.io

EffectMaker: Unifying Reasoning and Generation for Customized Visual Effect Creation

TL;DR

Abstract

Paper Structure (50 sections, 2 equations, 22 figures, 4 tables)

This paper contains 50 sections, 2 equations, 22 figures, 4 tables.

Introduction
Related work
General video generation.
Generation with understanding.
Visual effect generation.
Method
Overview
Effect understanding
Effect generation
Semantic conditioning via decoupled cross-attention.
Visual conditioning via in-context learning.
Biased RoPE.
Data construction
Experiment
Experiment setup
...and 35 more sections

Figures (22)

Figure 1: Given a reference video with visual effect (top row in each grid), and a user-specified target image (wrapped by shadow box), our EffectMaker transfers the reference effect to user image to create vivid video (bottom row in each grid) with the same effect pattern.
Figure 2: Overview of our model architecture. Given a reference VFX video and a target image, on the reasoning side, an MLLM extracts high-level semantic cues of the reference video, providing abstract effect descriptions that serve as semantic guidance. On the generation side, a video DiT model leverages in-context generation to capture fine-grained visual details from the reference, and generates a target video with consistent visual effect.
Figure 3: (a) Illustration of our EffectData construction pipeline. (b) Some examples from the EffectData dataset.
Figure 4: Qualitative comparison with related baselines on OpenVFX dataset.
Figure 5: Qualitative comparison on unseen visual effects.
...and 17 more figures

EffectMaker: Unifying Reasoning and Generation for Customized Visual Effect Creation

TL;DR

Abstract

EffectMaker: Unifying Reasoning and Generation for Customized Visual Effect Creation

Authors

TL;DR

Abstract

Table of Contents

Figures (22)