Table of Contents
Fetching ...

Fast and accurate neural reflectance transformation imaging through knowledge distillation

Tinsae G. Dulecha, Leonardo Righetto, Ruggero Pintus, Enrico Gobbetti, Andrea Giachetti

TL;DR

This work tackles the bottleneck of neural RTI decoding by introducing DisK-NeuralRTI, a knowledge-distillation-based method to train a compact student decoder that imitates a larger teacher. The approach achieves real-time, high-resolution interactive relighting on standard hardware without sacrificing relighting quality, outperforming classical PTM/HSH baselines and prior neural methods. It introduces an enhanced teacher architecture and the RealRTIHR benchmark to comprehensively evaluate both quality and interactive performance on high-resolution MLICs. Overall, the method provides a practical path to deploy neural RTI encodings in cultural heritage and related domains, combining high visual fidelity with scalable rendering performance.

Abstract

Reflectance Transformation Imaging (RTI) is very popular for its ability to visually analyze surfaces by enhancing surface details through interactive relighting, starting from only a few tens of photographs taken with a fixed camera and variable illumination. Traditional methods like Polynomial Texture Maps (PTM) and Hemispherical Harmonics (HSH) are compact and fast, but struggle to accurately capture complex reflectance fields using few per-pixel coefficients and fixed bases, leading to artifacts, especially in highly reflective or shadowed areas. The NeuralRTI approach, which exploits a neural autoencoder to learn a compact function that better approximates the local reflectance as a function of light directions, has been shown to produce superior quality at comparable storage cost. However, as it performs interactive relighting with custom decoder networks with many parameters, the rendering step is computationally expensive and not feasible at full resolution for large images on limited hardware. Earlier attempts to reduce costs by directly training smaller networks have failed to produce valid results. For this reason, we propose to reduce its computational cost through a novel solution based on Knowledge Distillation (DisK-NeuralRTI). ...

Fast and accurate neural reflectance transformation imaging through knowledge distillation

TL;DR

This work tackles the bottleneck of neural RTI decoding by introducing DisK-NeuralRTI, a knowledge-distillation-based method to train a compact student decoder that imitates a larger teacher. The approach achieves real-time, high-resolution interactive relighting on standard hardware without sacrificing relighting quality, outperforming classical PTM/HSH baselines and prior neural methods. It introduces an enhanced teacher architecture and the RealRTIHR benchmark to comprehensively evaluate both quality and interactive performance on high-resolution MLICs. Overall, the method provides a practical path to deploy neural RTI encodings in cultural heritage and related domains, combining high visual fidelity with scalable rendering performance.

Abstract

Reflectance Transformation Imaging (RTI) is very popular for its ability to visually analyze surfaces by enhancing surface details through interactive relighting, starting from only a few tens of photographs taken with a fixed camera and variable illumination. Traditional methods like Polynomial Texture Maps (PTM) and Hemispherical Harmonics (HSH) are compact and fast, but struggle to accurately capture complex reflectance fields using few per-pixel coefficients and fixed bases, leading to artifacts, especially in highly reflective or shadowed areas. The NeuralRTI approach, which exploits a neural autoencoder to learn a compact function that better approximates the local reflectance as a function of light directions, has been shown to produce superior quality at comparable storage cost. However, as it performs interactive relighting with custom decoder networks with many parameters, the rendering step is computationally expensive and not feasible at full resolution for large images on limited hardware. Earlier attempts to reduce costs by directly training smaller networks have failed to produce valid results. For this reason, we propose to reduce its computational cost through a novel solution based on Knowledge Distillation (DisK-NeuralRTI). ...

Paper Structure

This paper contains 22 sections, 1 equation, 9 figures, 10 tables.

Figures (9)

  • Figure 1: Network architecture for original NeuralRTI (top) and DisK-NeuralRTI (bottom).
  • Figure 2: Line charts showing the relighting quality (PSNR) as a function of the number of decoder parameters for the SynthRTI Multi-Material benchmark (a), the SynthRTI Multi-Material benchmark (b), the RealRTI benchmark (c). Training with DisK-NeuralRTI results in metrics close to or better than the teacher for the 723 parameters version.
  • Figure 3: (a) Relight with a test light direction of the SynthRTI multi-material set using the NeuralRTI(20) model. (b) Relight with the same light direction obtained with the DisK-NeuralRTI(20) compressed model. (c) Ground truth image corresponding to the test direction. It is possible to see (see arrows) that the layer size reduction with the original training (a) results in the loss of accuracy of the specular reflections and shadows. The image in (b) presents fewer artifacts compared with the ground truth (c). From drp24.
  • Figure 4: (a) Relight with a test light direction of the SynthRTI multi-material set using the NeuralRTI (50) model. (b) Relight with the same light direction obtained with the NeuralRTI(20) compressed model with standard training. (c) Relight with the same light direction obtained with the NeuralRTI(20) compressed model trained with improved teacher and Knowledge Distillation. The last result is the only one avoiding artifacts in shadows (yellow arrow) and non-realistic highlight (cyan arrow) compared to the ground truth (d).
  • Figure 5: Relight of a challenging object from the RealRTI benchmark. The relight obtained with the original Neural RTI method (a) reproduces the metallic behavior, but the golden part appears dark, the highlights are exaggerated with respect to the ground truth (d), and the cast shadow presents blending artifacts. Using this model to train a compressed decoder, we lose most of the highlights while the artifacts in the shadows are still there. The training of the lightweight decoder with the improved teacher, however, result in a relighted image with highlights and colors quite close to the ground truth and with reduced artifacts (c).
  • ...and 4 more figures