Table of Contents
Fetching ...

Color3D: Controllable and Consistent 3D Colorization with Personalized Colorizer

Yecong Wan, Mingwen Shao, Renlong Wu, Wangmeng Zuo

TL;DR

Color3D tackles monochrome 3D colorization for both static and dynamic scenes by learning a per-scene personalized colorizer from a single colorized key view and propagating its mapping to all other views/time steps. It reframes 3D colorization as a color-information propagation task and introduces a Lab Gaussian representation to separately optimize luminance and chrominance, improving stability and fidelity. The method combines key-view entropy-based selection, single-view augmentation, and stage-wise fine-tuning to achieve strong cross-view and cross-time consistency with rich, controllable colors, demonstrated on static/dynamic benchmarks and real-world content. The results show substantial improvements over prior approaches, enabling practical applications in art, culture heritage, and legacy content restoration.

Abstract

In this work, we present Color3D, a highly adaptable framework for colorizing both static and dynamic 3D scenes from monochromatic inputs, delivering visually diverse and chromatically vibrant reconstructions with flexible user-guided control. In contrast to existing methods that focus solely on static scenarios and enforce multi-view consistency by averaging color variations which inevitably sacrifice both chromatic richness and controllability, our approach is able to preserve color diversity and steerability while ensuring cross-view and cross-time consistency. In particular, the core insight of our method is to colorize only a single key view and then fine-tune a personalized colorizer to propagate its color to novel views and time steps. Through personalization, the colorizer learns a scene-specific deterministic color mapping underlying the reference view, enabling it to consistently project corresponding colors to the content in novel views and video frames via its inherent inductive bias. Once trained, the personalized colorizer can be applied to infer consistent chrominance for all other images, enabling direct reconstruction of colorful 3D scenes with a dedicated Lab color space Gaussian splatting representation. The proposed framework ingeniously recasts complicated 3D colorization as a more tractable single image paradigm, allowing seamless integration of arbitrary image colorization models with enhanced flexibility and controllability. Extensive experiments across diverse static and dynamic 3D colorization benchmarks substantiate that our method can deliver more consistent and chromatically rich renderings with precise user control. Project Page https://yecongwan.github.io/Color3D/.

Color3D: Controllable and Consistent 3D Colorization with Personalized Colorizer

TL;DR

Color3D tackles monochrome 3D colorization for both static and dynamic scenes by learning a per-scene personalized colorizer from a single colorized key view and propagating its mapping to all other views/time steps. It reframes 3D colorization as a color-information propagation task and introduces a Lab Gaussian representation to separately optimize luminance and chrominance, improving stability and fidelity. The method combines key-view entropy-based selection, single-view augmentation, and stage-wise fine-tuning to achieve strong cross-view and cross-time consistency with rich, controllable colors, demonstrated on static/dynamic benchmarks and real-world content. The results show substantial improvements over prior approaches, enabling practical applications in art, culture heritage, and legacy content restoration.

Abstract

In this work, we present Color3D, a highly adaptable framework for colorizing both static and dynamic 3D scenes from monochromatic inputs, delivering visually diverse and chromatically vibrant reconstructions with flexible user-guided control. In contrast to existing methods that focus solely on static scenarios and enforce multi-view consistency by averaging color variations which inevitably sacrifice both chromatic richness and controllability, our approach is able to preserve color diversity and steerability while ensuring cross-view and cross-time consistency. In particular, the core insight of our method is to colorize only a single key view and then fine-tune a personalized colorizer to propagate its color to novel views and time steps. Through personalization, the colorizer learns a scene-specific deterministic color mapping underlying the reference view, enabling it to consistently project corresponding colors to the content in novel views and video frames via its inherent inductive bias. Once trained, the personalized colorizer can be applied to infer consistent chrominance for all other images, enabling direct reconstruction of colorful 3D scenes with a dedicated Lab color space Gaussian splatting representation. The proposed framework ingeniously recasts complicated 3D colorization as a more tractable single image paradigm, allowing seamless integration of arbitrary image colorization models with enhanced flexibility and controllability. Extensive experiments across diverse static and dynamic 3D colorization benchmarks substantiate that our method can deliver more consistent and chromatically rich renderings with precise user control. Project Page https://yecongwan.github.io/Color3D/.

Paper Structure

This paper contains 18 sections, 9 equations, 15 figures, 3 tables.

Figures (15)

  • Figure 1: Exemplary Visual Results of Color3D. Color3D is a unified controllable 3D colorization framework for both static and dynamic scenes, producing vivid and chromatically rich renderings with strong cross-view and cross-time consistency. Our method supports diverse colorization controls, including language-guided (left), automatic inference (middle), and reference-based (right), showcasing its versatility and practical value.
  • Figure 2: The overall pipeline of Color3D. Our framework comprises two primary stages. In the first stage, we initially identify the most informative key view from the given monochromatic images and video frames, and employ an off-the-shelf image colorization model to generate a colorized single view. Then, a single view augmentation scheme is elaborated to amplify the data, and the augmented samples are subsequently used to fine-tune a per-scene personalized colorizer. In the second stage, this personalized colorizer is utilized to infer consistent chromatic content of the remaining views or frames, and directly reconstruct the colorful 3D scene with Lab color space 3DGS or 4DGS.
  • Figure 3: (a): Illustration of the proposed single view augmentation scheme that combines generative augmentations and traditional augmentations to enrich the single colored view with consistent color distribution. (b): Architecture of the colorizer consists of a frozen DDColor encoder alongside a trainable adapter and CNN decoder. (c): Lab Gaussian that first warms up with three $L$ channels and then switches to full $Lab$ channels for color optimization.
  • Figure 4: Qualitative comparisons on static 3D scene colorization benchmarks. Our method produces more color-accurate and color-rich results while maintaining multi-view consistency.
  • Figure 5: Qualitative comparisons on dynamic 3D scene colorization benchmarks. Our method consistently yields spatial-temporal coherent results with vivid and perceptually realistic color.
  • ...and 10 more figures