Table of Contents
Fetching ...

Render-FM: A Foundation Model for Real-time Photorealistic Volumetric Rendering

Zhongpai Gao, Meng Zheng, Benjamin Planche, Anwesa Choudhuri, Terrence Chen, Ziyan Wu

TL;DR

Render-FM addresses the protracted per-scan optimization bottleneck in real-time CT volumetric rendering by learning a foundation model that directly regresses 6D Gaussian Splatting (6DGS) parameters from a CT volume $V \in \mathbb{R}^{C \times D \times H \times W}$ with $C=6$. The method uses an nnU-Net–inspired encoder–decoder to produce a dense parameter volume for 6DGS, followed by a differentiable 6DGS renderer that applies view-dependent covariance slicing, enabling end-to-end training with $\mathcal{L} = \lambda_{L1}\mathcal{L}_{L1} + \lambda_{SSIM}\mathcal{L}_{SSIM}$. Key contributions include anatomically informed anatomy-guided priming (AGP), end-to-end training on large public CT datasets, and a pipeline that renders high-fidelity, real-time visuals without per-scan optimization, with fine-tuning offering further gains. The results show Render-FM matching or surpassing per-scan optimized methods while reducing preparation time to seconds, enabling seamless integration into real-time surgical planning and diagnostic workflows.

Abstract

Volumetric rendering of Computed Tomography (CT) scans is crucial for visualizing complex 3D anatomical structures in medical imaging. Current high-fidelity approaches, especially neural rendering techniques, require time-consuming per-scene optimization, limiting clinical applicability due to computational demands and poor generalizability. We propose Render-FM, a novel foundation model for direct, real-time volumetric rendering of CT scans. Render-FM employs an encoder-decoder architecture that directly regresses 6D Gaussian Splatting (6DGS) parameters from CT volumes, eliminating per-scan optimization through large-scale pre-training on diverse medical data. By integrating robust feature extraction with the expressive power of 6DGS, our approach efficiently generates high-quality, real-time interactive 3D visualizations across diverse clinical CT data. Experiments demonstrate that Render-FM achieves visual fidelity comparable or superior to specialized per-scan methods while drastically reducing preparation time from nearly an hour to seconds for a single inference step. This advancement enables seamless integration into real-time surgical planning and diagnostic workflows. The project page is: https://gaozhongpai.github.io/renderfm/.

Render-FM: A Foundation Model for Real-time Photorealistic Volumetric Rendering

TL;DR

Render-FM addresses the protracted per-scan optimization bottleneck in real-time CT volumetric rendering by learning a foundation model that directly regresses 6D Gaussian Splatting (6DGS) parameters from a CT volume with . The method uses an nnU-Net–inspired encoder–decoder to produce a dense parameter volume for 6DGS, followed by a differentiable 6DGS renderer that applies view-dependent covariance slicing, enabling end-to-end training with . Key contributions include anatomically informed anatomy-guided priming (AGP), end-to-end training on large public CT datasets, and a pipeline that renders high-fidelity, real-time visuals without per-scan optimization, with fine-tuning offering further gains. The results show Render-FM matching or surpassing per-scan optimized methods while reducing preparation time to seconds, enabling seamless integration into real-time surgical planning and diagnostic workflows.

Abstract

Volumetric rendering of Computed Tomography (CT) scans is crucial for visualizing complex 3D anatomical structures in medical imaging. Current high-fidelity approaches, especially neural rendering techniques, require time-consuming per-scene optimization, limiting clinical applicability due to computational demands and poor generalizability. We propose Render-FM, a novel foundation model for direct, real-time volumetric rendering of CT scans. Render-FM employs an encoder-decoder architecture that directly regresses 6D Gaussian Splatting (6DGS) parameters from CT volumes, eliminating per-scan optimization through large-scale pre-training on diverse medical data. By integrating robust feature extraction with the expressive power of 6DGS, our approach efficiently generates high-quality, real-time interactive 3D visualizations across diverse clinical CT data. Experiments demonstrate that Render-FM achieves visual fidelity comparable or superior to specialized per-scan methods while drastically reducing preparation time from nearly an hour to seconds for a single inference step. This advancement enables seamless integration into real-time surgical planning and diagnostic workflows. The project page is: https://gaozhongpai.github.io/renderfm/.

Paper Structure

This paper contains 14 sections, 1 equation, 3 figures, 4 tables.

Figures (3)

  • Figure 1: Pipeline comparison of a) 6DGS with per-scan optimization and b) Our Render-FM with feed-forward prediction.
  • Figure 2: Overview of the Render-FM pipeline. A 3D U-Net encoder-decoder network processes a 6-channel input volume and regresses 37-channel 6D Gaussian Splatting (6DGS) parameters per voxel. Foreground voxels (mask) instantiate Gaussians. A differentiable 6DGS renderer, incorporating view-dependent slicing, produces the final image, enabling end-to-end training via rendering loss.
  • Figure 3: Qualitative comparison of rendering methods (zoom in for more details).