Table of Contents
Fetching ...

LoGoColor: Local-Global 3D Colorization for 360° Scenes

Yeonjin Chang, Juhwan Cho, Seunghyeon Seo, Wonsik Shin, Nojun Kwak

TL;DR

The paper tackles the problem of colorizing geometry-only 3D reconstructions, especially for 360° scenes, where prior methods rely on averaging 2D color model outputs and lose color diversity. It introduces LoGoColor, a Local-Global pipeline that first reconstructs single-channel geometry, decomposes the scene into subscenes, and then uses a fine-tuned multi-view diffusion model to enforce both inter-subscene and intra-subscene color consistency, followed by local color propagation. A novel Color Diversity Index (CDI) is proposed to quantify color richness, and extensive experiments on LLFF, Mip-NeRF 360, and Tanks and Temples show improved color diversity with competitive consistency compared to ColorNeRF and ChromaDistill, including applications to thermal data. The approach demonstrates robust multi-view colorization for complex 360° scenes, enabling richer visualizations for VR/AR and related applications.

Abstract

Single-channel 3D reconstruction is widely used in fields such as robotics and medical imaging. While this line of work excels at reconstructing 3D geometry, the outputs are not colored 3D models, thus 3D colorization is required for visualization. Recent 3D colorization studies address this problem by distilling 2D image colorization models. However, these approaches suffer from an inherent inconsistency of 2D image models. This results in colors being averaged during training, leading to monotonous and oversimplified results, particularly in complex 360° scenes. In contrast, we aim to preserve color diversity by generating a new set of consistently colorized training views, thereby bypassing the averaging process. Nevertheless, eliminating the averaging process introduces a new challenge: ensuring strict multi-view consistency across these colorized views. To achieve this, we propose LoGoColor, a pipeline designed to preserve color diversity by eliminating this guidance-averaging process with a `Local-Global' approach: we partition the scene into subscenes and explicitly tackle both inter-subscene and intra-subscene consistency using a fine-tuned multi-view diffusion model. We demonstrate that our method achieves quantitatively and qualitatively more consistent and plausible 3D colorization on complex 360° scenes than existing methods, and validate its superior color diversity using a novel Color Diversity Index.

LoGoColor: Local-Global 3D Colorization for 360° Scenes

TL;DR

The paper tackles the problem of colorizing geometry-only 3D reconstructions, especially for 360° scenes, where prior methods rely on averaging 2D color model outputs and lose color diversity. It introduces LoGoColor, a Local-Global pipeline that first reconstructs single-channel geometry, decomposes the scene into subscenes, and then uses a fine-tuned multi-view diffusion model to enforce both inter-subscene and intra-subscene color consistency, followed by local color propagation. A novel Color Diversity Index (CDI) is proposed to quantify color richness, and extensive experiments on LLFF, Mip-NeRF 360, and Tanks and Temples show improved color diversity with competitive consistency compared to ColorNeRF and ChromaDistill, including applications to thermal data. The approach demonstrates robust multi-view colorization for complex 360° scenes, enabling richer visualizations for VR/AR and related applications.

Abstract

Single-channel 3D reconstruction is widely used in fields such as robotics and medical imaging. While this line of work excels at reconstructing 3D geometry, the outputs are not colored 3D models, thus 3D colorization is required for visualization. Recent 3D colorization studies address this problem by distilling 2D image colorization models. However, these approaches suffer from an inherent inconsistency of 2D image models. This results in colors being averaged during training, leading to monotonous and oversimplified results, particularly in complex 360° scenes. In contrast, we aim to preserve color diversity by generating a new set of consistently colorized training views, thereby bypassing the averaging process. Nevertheless, eliminating the averaging process introduces a new challenge: ensuring strict multi-view consistency across these colorized views. To achieve this, we propose LoGoColor, a pipeline designed to preserve color diversity by eliminating this guidance-averaging process with a `Local-Global' approach: we partition the scene into subscenes and explicitly tackle both inter-subscene and intra-subscene consistency using a fine-tuned multi-view diffusion model. We demonstrate that our method achieves quantitatively and qualitatively more consistent and plausible 3D colorization on complex 360° scenes than existing methods, and validate its superior color diversity using a novel Color Diversity Index.

Paper Structure

This paper contains 43 sections, 7 equations, 13 figures, 6 tables.

Figures (13)

  • Figure 1: Teaser -- We propose LoGoColor to achieve color-rich 3D colorization by minimizing guidance from image models. This avoids the guidance-averaging of prior works, which relied on inconsistent image model outputs and led to monotonous results. To do so, we explicitly handle consistency with a Local-Global approach, ensuring both intra- and inter- subscene consistency.
  • Figure 2: Our View-based Subscene Decomposition. -- Starting from the base view $\mathbf{W}_{b_1}$ that observes the largest number of Gaussians, we use a greedy algorithm to iteratively select subsequent base views that maximize coverage while minimizing overlap.
  • Figure 3: Overview of LoGoColor -- We first reconstruct single-channel 3D Gaussians from multi-view grayscale images to recover scene geometry. Using this geometry, we decompose the scene into subscenes and select their corresponding base views. In parallel, we fine-tune a multi-view diffusion model to transfer color from reference views. We then calibrate global consistency among the base views and propagate color across all training views, ultimately producing a fully colorized 3D Gaussian model.
  • Figure 4: Qualitative comparison on LLFF, Tanks and Temples (TnT) datasets and Mip-NeRF 360 -- While other methods perform reasonably well on the LLFF dataset, they tend to color the leaves in the 'flower' scene uniformly. In 360-degree scenes from the TnT and Mip-NeRF 360 datasets, other methods produce monotonous colors. In contrast, our method yields plausible and diverse colors across all regions.
  • Figure 5: Consistency ablation -- Our global calibration step is essential for mitigating the continuous color shifting artifact observed in uncalibrated views, ensuring global consistency.
  • ...and 8 more figures