Table of Contents
Fetching ...

ChromaDistill: Colorizing Monochrome Radiance Fields with Knowledge Distillation

Ankit Dhiman, R Srinath, Srinjay Sarkar, Lokesh R Boregowda, R Venkatesh Babu

TL;DR

ChromaDistill tackles colorizing monochrome radiance-field representations from grayscale multi-view images by transferring color from pretrained image colorizers through a two-stage distillation framework. It first learns a Luma Radiance Field to capture geometry and luminance, then distills chroma information from a colorization teacher into the radiance field without increasing inference cost, aided by a multi-scale self-regularization to prevent color desaturation. The method is agnostic to the underlying 3D representation (e.g., NeRF, 3DGS) and demonstrates strong cross-view consistency improvements over baselines, with applications to Infra-Red data and legacy grayscale content. These results indicate practical benefits for 3D colorization tasks and downstream perception tasks.

Abstract

Colorization is a well-explored problem in the domains of image and video processing. However, extending colorization to 3D scenes presents significant challenges. Recent Neural Radiance Field (NeRF) and Gaussian-Splatting(3DGS) methods enable high-quality novel-view synthesis for multi-view images. However, the question arises: How can we colorize these 3D representations? This work presents a method for synthesizing colorized novel views from input grayscale multi-view images. Using image or video colorization methods to colorize novel views from these 3D representations naively will yield output with severe inconsistencies. We introduce a novel method to use powerful image colorization models for colorizing 3D representations. We propose a distillation-based method that transfers color from these networks trained on natural images to the target 3D representation. Notably, this strategy does not add any additional weights or computational overhead to the original representation during inference. Extensive experiments demonstrate that our method produces high-quality colorized views for indoor and outdoor scenes, showcasing significant cross-view consistency advantages over baseline approaches. Our method is agnostic to the underlying 3D representation and easily generalizable to NeRF and 3DGS methods. Further, we validate the efficacy of our approach in several diverse applications: 1.) Infra-Red (IR) multi-view images and 2.) Legacy grayscale multi-view image sequences. Project Webpage: https://val.cds.iisc.ac.in/chroma-distill.github.io/

ChromaDistill: Colorizing Monochrome Radiance Fields with Knowledge Distillation

TL;DR

ChromaDistill tackles colorizing monochrome radiance-field representations from grayscale multi-view images by transferring color from pretrained image colorizers through a two-stage distillation framework. It first learns a Luma Radiance Field to capture geometry and luminance, then distills chroma information from a colorization teacher into the radiance field without increasing inference cost, aided by a multi-scale self-regularization to prevent color desaturation. The method is agnostic to the underlying 3D representation (e.g., NeRF, 3DGS) and demonstrates strong cross-view consistency improvements over baselines, with applications to Infra-Red data and legacy grayscale content. These results indicate practical benefits for 3D colorization tasks and downstream perception tasks.

Abstract

Colorization is a well-explored problem in the domains of image and video processing. However, extending colorization to 3D scenes presents significant challenges. Recent Neural Radiance Field (NeRF) and Gaussian-Splatting(3DGS) methods enable high-quality novel-view synthesis for multi-view images. However, the question arises: How can we colorize these 3D representations? This work presents a method for synthesizing colorized novel views from input grayscale multi-view images. Using image or video colorization methods to colorize novel views from these 3D representations naively will yield output with severe inconsistencies. We introduce a novel method to use powerful image colorization models for colorizing 3D representations. We propose a distillation-based method that transfers color from these networks trained on natural images to the target 3D representation. Notably, this strategy does not add any additional weights or computational overhead to the original representation during inference. Extensive experiments demonstrate that our method produces high-quality colorized views for indoor and outdoor scenes, showcasing significant cross-view consistency advantages over baseline approaches. Our method is agnostic to the underlying 3D representation and easily generalizable to NeRF and 3DGS methods. Further, we validate the efficacy of our approach in several diverse applications: 1.) Infra-Red (IR) multi-view images and 2.) Legacy grayscale multi-view image sequences. Project Webpage: https://val.cds.iisc.ac.in/chroma-distill.github.io/
Paper Structure (30 sections, 4 equations, 22 figures, 8 tables, 2 algorithms)

This paper contains 30 sections, 4 equations, 22 figures, 8 tables, 2 algorithms.

Figures (22)

  • Figure 1: (a) Overview of our method. Given input multi-view gray-scale views, the proposed approach "ChromaDistill" is able to generate colorized views which are 3D consistent. Two colorized novel-views (b) and (e) by I. Image-colorization baseline, II. Video-colorization baseline, and III. our approach on "playground" scene from LLFF mildenhall2019llff dataset. State-of-the-art colorization baselines generate 3D inconsistent novel-views as shown in zoomed-in regions in (c) and (d).
  • Figure 2: Overall architecture of our method. First, we train a radiance field network from input multi-view grayscale images in the "Luma Radiance Field Stage". Next, we distill knowledge from a teacher colorization network trained on natural images to the radiance field network trained in the previous stage.
  • Figure 3: Qualitative results of our method on baselines for "Pasta" and "Truck" scene. We display two novel views rendered from different viewpoints, with rows 1 and 3 at the original resolution and rows 2 and 4 zoomed in on the highlighted regions. Even the video-based baselines (columns 2 and 3) exhibit inconsistencies. Note the color change in highlighted regions in "Truck" scene.
  • Figure 4: (left-to-right) Results from ARF zhang2022arf, Stylized-NeRF huang2022stylizednerf, Ref-NPR zhang2023ref and Our method. (Bottom Row) Zoomed-in region of the highlighted region. Check the artifacts from results in stylization works
  • Figure 5: (a) & (b) Novel-views from Color-NeRF cheng2024colorizing and (c) & (d) Novel-views from our method. Bottom row of each scene illustrates zoomed-in regions. Notice the inconsistency in Color-NeRF.
  • ...and 17 more figures