Table of Contents
Fetching ...

LiteVoxel: Low-memory Intelligent Thresholding for Efficient Voxel Rasterization

Jee Won Lee, Jongseong Brad Choi

TL;DR

LiteVoxel tackles memory growth and low-frequency underfitting in sparse-voxel rasterization for view synthesis by introducing a self-tuning pipeline with three voxel-native mechanisms: low-frequency-aware photometric reweighting, depth-aware quantile pruning with EMA-hysteresis and keep-halo guards, and priority-driven subdivision under a growth budget. The approach redirect gradients to flat regions after geometry stabilizes, balance sparsity across depth, and refine only where the image formation process can resolve detail, all while maintaining perceptual quality and reducing peak VRAM by about 40–60%. On Mip-NeRF 360 and Tanks & Temples, LiteVoxel matches SVRaster and explicit-splatting baselines in PSNR/SSIM/LPIPS, with substantially lower memory footprints and similar throughput, demonstrating predictable memory usage without sacrificing fidelity. Ablations confirm that each component—LF curriculum, depth-aware pruning, and footprint-guided subdivision—contributes to improved low-frequency fidelity, stable topology, and bounded model growth, paving the way for memory-efficient explicit voxel representations in large scenes.

Abstract

Sparse-voxel rasterization is a fast, differentiable alternative for optimization-based scene reconstruction, but it tends to underfit low-frequency content, depends on brittle pruning heuristics, and can overgrow in ways that inflate VRAM. We introduce LiteVoxel, a self-tuning training pipeline that makes SV rasterization both steadier and lighter. Our loss is made low-frequency aware via an inverse-Sobel reweighting with a mid-training gamma-ramp, shifting gradient budget to flat regions only after geometry stabilize. Adaptation replaces fixed thresholds with a depth-quantile pruning logic on maximum blending weight, stabilized by EMA-hysteresis guards and refines structure through ray-footprint-based, priority-driven subdivision under an explicit growth budget. Ablations and full-system results across Mip-NeRF 360 (6scenes) and Tanks & Temples (3scenes) datasets show mitigation of errors in low-frequency regions and boundary instability while keeping PSNR/SSIM, training time, and FPS comparable to a strong SVRaster pipeline. Crucially, LiteVoxel reduces peak VRAM by ~40%-60% and preserves low-frequency detail that prior setups miss, enabling more predictable, memory-efficient training without sacrificing perceptual quality.

LiteVoxel: Low-memory Intelligent Thresholding for Efficient Voxel Rasterization

TL;DR

LiteVoxel tackles memory growth and low-frequency underfitting in sparse-voxel rasterization for view synthesis by introducing a self-tuning pipeline with three voxel-native mechanisms: low-frequency-aware photometric reweighting, depth-aware quantile pruning with EMA-hysteresis and keep-halo guards, and priority-driven subdivision under a growth budget. The approach redirect gradients to flat regions after geometry stabilizes, balance sparsity across depth, and refine only where the image formation process can resolve detail, all while maintaining perceptual quality and reducing peak VRAM by about 40–60%. On Mip-NeRF 360 and Tanks & Temples, LiteVoxel matches SVRaster and explicit-splatting baselines in PSNR/SSIM/LPIPS, with substantially lower memory footprints and similar throughput, demonstrating predictable memory usage without sacrificing fidelity. Ablations confirm that each component—LF curriculum, depth-aware pruning, and footprint-guided subdivision—contributes to improved low-frequency fidelity, stable topology, and bounded model growth, paving the way for memory-efficient explicit voxel representations in large scenes.

Abstract

Sparse-voxel rasterization is a fast, differentiable alternative for optimization-based scene reconstruction, but it tends to underfit low-frequency content, depends on brittle pruning heuristics, and can overgrow in ways that inflate VRAM. We introduce LiteVoxel, a self-tuning training pipeline that makes SV rasterization both steadier and lighter. Our loss is made low-frequency aware via an inverse-Sobel reweighting with a mid-training gamma-ramp, shifting gradient budget to flat regions only after geometry stabilize. Adaptation replaces fixed thresholds with a depth-quantile pruning logic on maximum blending weight, stabilized by EMA-hysteresis guards and refines structure through ray-footprint-based, priority-driven subdivision under an explicit growth budget. Ablations and full-system results across Mip-NeRF 360 (6scenes) and Tanks & Temples (3scenes) datasets show mitigation of errors in low-frequency regions and boundary instability while keeping PSNR/SSIM, training time, and FPS comparable to a strong SVRaster pipeline. Crucially, LiteVoxel reduces peak VRAM by ~40%-60% and preserves low-frequency detail that prior setups miss, enabling more predictable, memory-efficient training without sacrificing perceptual quality.

Paper Structure

This paper contains 15 sections, 16 equations, 7 figures, 2 tables.

Figures (7)

  • Figure :
  • Figure :
  • Figure :
  • Figure :
  • Figure :
  • ...and 2 more figures