FaithFusion: Harmonizing Reconstruction and Generation via Pixel-wise Information Gain

YuAn Wang; Xiaofan Li; Chi Huang; Wenhao Zhang; Hao Li; Bosheng Wang; Xun Sun; Jun Wang

FaithFusion: Harmonizing Reconstruction and Generation via Pixel-wise Information Gain

YuAn Wang, Xiaofan Li, Chi Huang, Wenhao Zhang, Hao Li, Bosheng Wang, Xun Sun, Jun Wang

TL;DR

FaithFusion addresses the problem of reconciling geometric fidelity in 3DGS-based driving-scene reconstruction with plausible appearance generation under large viewpoint shifts. It introduces pixel-wise Expected Information Gain (EIG) as a unified policy that guides diffusion as a spatial prior and as a loss weight to distill edits back into 3DGS; the method derives a tractable upper bound via the Laplace approximation leading to $\text{EIG} \le \frac{1}{2} \operatorname{tr}\left(H''[Y_{NVS}|X_{NVS},\boldsymbol{\omega}^*](H''[\boldsymbol{\omega}^*])^{-1}\right)$ and distributes the information across pixels along rays. The approach comprises a dual-branch EIGent generator and a progressive diffusion-to-3DGS integration that operates without extra priors. Experiments on Waymo show FaithFusion achieving state-of-the-art results across NTA-IoU, NTL-IoU, and FID, including robustness to lane shifts up to $6$ meters, with FID reaching $107.47$. The work offers a general, plug-and-play framework for unified, controllable 4D driving-scene modeling with potential for active mapping extensions.

Abstract

In controllable driving-scene reconstruction and 3D scene generation, maintaining geometric fidelity while synthesizing visually plausible appearance under large viewpoint shifts is crucial. However, effective fusion of geometry-based 3DGS and appearance-driven diffusion models faces inherent challenges, as the absence of pixel-wise, 3D-consistent editing criteria often leads to over-restoration and geometric drift. To address these issues, we introduce \textbf{FaithFusion}, a 3DGS-diffusion fusion framework driven by pixel-wise Expected Information Gain (EIG). EIG acts as a unified policy for coherent spatio-temporal synthesis: it guides diffusion as a spatial prior to refine high-uncertainty regions, while its pixel-level weighting distills the edits back into 3DGS. The resulting plug-and-play system is free from extra prior conditions and structural modifications.Extensive experiments on the Waymo dataset demonstrate that our approach attains SOTA performance across NTA-IoU, NTL-IoU, and FID, maintaining an FID of 107.47 even at 6 meters lane shift. Our code is available at https://github.com/wangyuanbiubiubiu/FaithFusion.

FaithFusion: Harmonizing Reconstruction and Generation via Pixel-wise Information Gain

TL;DR

Abstract

FaithFusion: Harmonizing Reconstruction and Generation via Pixel-wise Information Gain

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (10)