Pathwise Test-Time Correction for Autoregressive Long Video Generation
Xunzhi Xiang, Zixuan Duan, Guiyu Zhang, Haiyu Zhang, Zhe Gao, Junta Wu, Shaofeng Zhang, Tengfei Wang, Qi Fan, Chunchao Guo
TL;DR
This work tackles the problem of error accumulation in autoregressive diffusion-based long-horizon video generation with distilled few-step samplers. It introduces Test-Time Correction (TTC), a training-free, path-aware intervention that anchors intermediate states to the initial frame and integrates on-path re-noising, enabling stable 30-second video generation without model retraining. Through extensive ablations and comparisons against baselines and training-based methods, TTC demonstrates substantial reductions in temporal drift and improved temporal coherence while incurring negligible overhead. The approach is validated across multiple distilled architectures, offering a practical route to reliable real-time long-video synthesis.
Abstract
Distilled autoregressive diffusion models facilitate real-time short video synthesis but suffer from severe error accumulation during long-sequence generation. While existing Test-Time Optimization (TTO) methods prove effective for images or short clips, we identify that they fail to mitigate drift in extended sequences due to unstable reward landscapes and the hypersensitivity of distilled parameters. To overcome these limitations, we introduce Test-Time Correction (TTC), a training-free alternative. Specifically, TTC utilizes the initial frame as a stable reference anchor to calibrate intermediate stochastic states along the sampling trajectory. Extensive experiments demonstrate that our method seamlessly integrates with various distilled models, extending generation lengths with negligible overhead while matching the quality of resource-intensive training-based methods on 30-second benchmarks.
