Table of Contents
Fetching ...

Geometry-Aware Global Feature Aggregation for Real-Time Indirect Illumination

Meng Gai, Guoping Wang, Sheng Li

TL;DR

This work tackles real-time global illumination by learning a screen-space estimator for diffuse indirect illumination that is combined with direct lighting to produce HDR results. It introduces a geometry-aware feature aggregation module (GFA) built on a modified multi-head attention mechanism and a monochromatic shading generator that processes RGB channels independently, guided by a geometry encoder. A novel HDR synthetic indoor dataset and an adversarial training framework with a perceptual loss are developed to train the network stably in linear HDR space. The approach achieves real-time performance (~12 ms at 768×512) while delivering improved PSNR, LPIPS, and color fidelity over prior methods, demonstrates strong generalization to new scenes and colored lighting, and provides a robust, geometry-guided path toward practical neural rendering for VR/AR workflows.

Abstract

Real-time rendering with global illumination is crucial to afford the user realistic experience in virtual environments. We present a learning-based estimator to predict diffuse indirect illumination in screen space, which then is combined with direct illumination to synthesize globally-illuminated high dynamic range (HDR) results. Our approach tackles the challenges of capturing long-range/long-distance indirect illumination when employing neural networks and is generalized to handle complex lighting and scenarios. From the neural network thinking of the solver to the rendering equation, we present a novel network architecture to predict indirect illumination. Our network is equipped with a modified attention mechanism that aggregates global information guided by spacial geometry features, as well as a monochromatic design that encodes each color channel individually. We conducted extensive evaluations, and the experimental results demonstrate our superiority over previous learning-based techniques. Our approach excels at handling complex lighting such as varying-colored lighting and environment lighting. It can successfully capture distant indirect illumination and simulates the interreflections between textured surfaces well (i.e., color bleeding effects); it can also effectively handle new scenes that are not present in the training dataset.

Geometry-Aware Global Feature Aggregation for Real-Time Indirect Illumination

TL;DR

This work tackles real-time global illumination by learning a screen-space estimator for diffuse indirect illumination that is combined with direct lighting to produce HDR results. It introduces a geometry-aware feature aggregation module (GFA) built on a modified multi-head attention mechanism and a monochromatic shading generator that processes RGB channels independently, guided by a geometry encoder. A novel HDR synthetic indoor dataset and an adversarial training framework with a perceptual loss are developed to train the network stably in linear HDR space. The approach achieves real-time performance (~12 ms at 768×512) while delivering improved PSNR, LPIPS, and color fidelity over prior methods, demonstrates strong generalization to new scenes and colored lighting, and provides a robust, geometry-guided path toward practical neural rendering for VR/AR workflows.

Abstract

Real-time rendering with global illumination is crucial to afford the user realistic experience in virtual environments. We present a learning-based estimator to predict diffuse indirect illumination in screen space, which then is combined with direct illumination to synthesize globally-illuminated high dynamic range (HDR) results. Our approach tackles the challenges of capturing long-range/long-distance indirect illumination when employing neural networks and is generalized to handle complex lighting and scenarios. From the neural network thinking of the solver to the rendering equation, we present a novel network architecture to predict indirect illumination. Our network is equipped with a modified attention mechanism that aggregates global information guided by spacial geometry features, as well as a monochromatic design that encodes each color channel individually. We conducted extensive evaluations, and the experimental results demonstrate our superiority over previous learning-based techniques. Our approach excels at handling complex lighting such as varying-colored lighting and environment lighting. It can successfully capture distant indirect illumination and simulates the interreflections between textured surfaces well (i.e., color bleeding effects); it can also effectively handle new scenes that are not present in the training dataset.

Paper Structure

This paper contains 18 sections, 7 equations, 14 figures, 2 tables.

Figures (14)

  • Figure 1: Our model efficiently synthesizes HDR results with global illumination predicted in around 12 milliseconds. We highlight that our model is well generalized to handle complex dynamic lighting in scenes, such as varying-colored lighting (left) and environment lighting (right), which are new scenes that have not been present in the training dataset.
  • Figure 2: Overview of our model architecture. GCM: geometry-aware conditional modulation; GFA: geometry-aware feature aggregation. The geometry encoder takes in the auxiliary geometry information of each frame as input and provides geometry conditional features (GCM) and attention weights (GFA) to the generator. The monochromatic shading generator then independently predicts each color channel of the indirect illumination, given the lighting and reflectance of the corresponding channel as input.
  • Figure 3: Geometry-aware feature aggregation operation. $\mathrm{p}_i$ and $\mathrm{p}_j$ could be any two 2D locations of the input features, thus effectively modeling global dependencies within screen space.
  • Figure 4: Our adversarial training framework. We use a PatchGAN as the discriminator for adversarial training, and the perceptual loss is acquired using a pre-trained VGG-19. The computation of adversarial loss is in a per-color-channel manner, whereas the perceptual loss is computed on the final RGB image.
  • Figure 5: Some indoor scenes rendered in our synthetic dataset.
  • ...and 9 more figures