Table of Contents
Fetching ...

MCBlock: Boosting Neural Radiance Field Training Speed by MCTS-based Dynamic-Resolution Ray Sampling

Yunpeng Tan, Junlin Hao, Jiangkai Wu, Liming Liu, Qingyang Li, Xinggong Zhang

TL;DR

MCBlock introduces a Monte-Carlo Tree Search-based dynamic-resolution ray sampling method for NeRF training, enabling blocks of varying pixel sizes to be trained in parallel according to image texture. It initializes a texture-informed block hierarchy, then dynamically expands and prunes blocks while using a redefined UCT to guide selection, achieving up to 2.33x training acceleration with minimal overhead. The approach blends active sampling with multi-resolution block training and demonstrates clear speedups across real-world (Mipnerf360) and synthetic (Blender) datasets, while maintaining competitive rendering quality. Its block-wise, cone-tracing compatible design suggests broad applicability to cone-tracing NeRF variants and real-time multimedia scenarios.

Abstract

Neural Radiance Field (NeRF) is widely known for high-fidelity novel view synthesis. However, even the state-of-the-art NeRF model, Gaussian Splatting, requires minutes for training, far from the real-time performance required by multimedia scenarios like telemedicine. One of the obstacles is its inefficient sampling, which is only partially addressed by existing works. Existing point-sampling algorithms uniformly sample simple-texture regions (easy to fit) and complex-texture regions (hard to fit), while existing ray-sampling algorithms sample these regions all in the finest granularity (i.e. the pixel level), both wasting GPU training resources. Actually, regions with different texture intensities require different sampling granularities. To this end, we propose a novel dynamic-resolution ray-sampling algorithm, MCBlock, which employs Monte Carlo Tree Search (MCTS) to partition each training image into pixel blocks with different sizes for active block-wise training. Specifically, the trees are initialized according to the texture of training images to boost the initialization speed, and an expansion/pruning module dynamically optimizes the block partition. MCBlock is implemented in Nerfstudio, an open-source toolset, and achieves a training acceleration of up to 2.33x, surpassing other ray-sampling algorithms. We believe MCBlock can apply to any cone-tracing NeRF model and contribute to the multimedia community.

MCBlock: Boosting Neural Radiance Field Training Speed by MCTS-based Dynamic-Resolution Ray Sampling

TL;DR

MCBlock introduces a Monte-Carlo Tree Search-based dynamic-resolution ray sampling method for NeRF training, enabling blocks of varying pixel sizes to be trained in parallel according to image texture. It initializes a texture-informed block hierarchy, then dynamically expands and prunes blocks while using a redefined UCT to guide selection, achieving up to 2.33x training acceleration with minimal overhead. The approach blends active sampling with multi-resolution block training and demonstrates clear speedups across real-world (Mipnerf360) and synthetic (Blender) datasets, while maintaining competitive rendering quality. Its block-wise, cone-tracing compatible design suggests broad applicability to cone-tracing NeRF variants and real-time multimedia scenarios.

Abstract

Neural Radiance Field (NeRF) is widely known for high-fidelity novel view synthesis. However, even the state-of-the-art NeRF model, Gaussian Splatting, requires minutes for training, far from the real-time performance required by multimedia scenarios like telemedicine. One of the obstacles is its inefficient sampling, which is only partially addressed by existing works. Existing point-sampling algorithms uniformly sample simple-texture regions (easy to fit) and complex-texture regions (hard to fit), while existing ray-sampling algorithms sample these regions all in the finest granularity (i.e. the pixel level), both wasting GPU training resources. Actually, regions with different texture intensities require different sampling granularities. To this end, we propose a novel dynamic-resolution ray-sampling algorithm, MCBlock, which employs Monte Carlo Tree Search (MCTS) to partition each training image into pixel blocks with different sizes for active block-wise training. Specifically, the trees are initialized according to the texture of training images to boost the initialization speed, and an expansion/pruning module dynamically optimizes the block partition. MCBlock is implemented in Nerfstudio, an open-source toolset, and achieves a training acceleration of up to 2.33x, surpassing other ray-sampling algorithms. We believe MCBlock can apply to any cone-tracing NeRF model and contribute to the multimedia community.

Paper Structure

This paper contains 27 sections, 3 equations, 10 figures, 4 tables.

Figures (10)

  • Figure 1: Different ray-sampling algorithms. Random ray sampling (a) samples rays in a uniform random way. Active ray sampling (b) assigns larger sampling probabilities for hard-to-fit pixels. (a)(b) together illustrate the single-resolution ray sampling. The coarse-to-fine multi-resolution ray sampling (c) samples blocks of the same size at the same time, and gradually increases the block size along with training. Dynamic-resolution ray sampling (Ours) (d) samples blocks of different sizes according to the image texture and the training process.
  • Figure 2: Training pipeline of MCBlock. We first initialize the tree structures using the training images, and render all the leaf nodes to get their loss values. From each leaf node to its root node, these loss values are backpropagated, and at the same time we prune some leaf nodes. With the updated trees, we select some leaf nodes, expand them, and choose a subset of the expanded nodes as a training batch. "Selection & expansion", "Leaf node rendering", and "Backpropagation & pruning" will be repeated until the model converges.
  • Figure 3: Tree structure initialization. We initialize the structures of trees according to the texture intensity of training images. Regions like the sky have fewer and larger blocks, and regions like the trees have more and smaller blocks.
  • Figure 4: Backpropagation & pruning. On the path from each leaf node of the batch to its root node, we backpropagate the UCT values and the loss values to nodes on the path and decide whether to prune their child nodes.
  • Figure 5: Selection & expansion. Using UCT as probability, we choose a root node and walk from the root node to one leaf node. Then we expand this node and randomly add one of the expanded nodes into the training batch.
  • ...and 5 more figures