Ouroboros: Single-step Diffusion Models for Cycle-consistent Forward and Inverse Rendering
Shanlin Sun, Yifan Wang, Hanwen Zhang, Yifeng Xiong, Qin Ren, Ruogu Fang, Xiaohui Xie, Chenyu You
TL;DR
Ouroboros addresses the cycle inconsistency and inefficiency of sequential forward and inverse diffusion rendering. It introduces two single-step diffusion models for RGB→X (inverse rendering) and X→RGB (forward rendering) that are trained with cycle-consistency losses and end-to-end fine-tuning, enabling fast, coherent bidirectional rendering. The approach leverages heterogeneous indoor/outdoor datasets (Hypersim, InteriorVerse, MatrixCity), uses channel dropout, and extends to training-free video inference with temporal patching and pseudo-3D kernels. It achieves state-of-the-art or competitive results on both inverse and forward rendering tasks, delivering up to a 50× speedup over prior diffusion methods and enabling temporally consistent video decomposition without video-specific training.
Abstract
While multi-step diffusion models have advanced both forward and inverse rendering, existing approaches often treat these problems independently, leading to cycle inconsistency and slow inference speed. In this work, we present Ouroboros, a framework composed of two single-step diffusion models that handle forward and inverse rendering with mutual reinforcement. Our approach extends intrinsic decomposition to both indoor and outdoor scenes and introduces a cycle consistency mechanism that ensures coherence between forward and inverse rendering outputs. Experimental results demonstrate state-of-the-art performance across diverse scenes while achieving substantially faster inference speed compared to other diffusion-based methods. We also demonstrate that Ouroboros can transfer to video decomposition in a training-free manner, reducing temporal inconsistency in video sequences while maintaining high-quality per-frame inverse rendering.
