Table of Contents
Fetching ...

SplatPainter: Interactive Authoring of 3D Gaussians from 2D Edits via Test-Time Training

Yang Zheng, Hao Tan, Kai Zhang, Peng Wang, Leonidas Guibas, Gordon Wetzstein, Wang Yifan

TL;DR

SplatPainter targets the gap in interactive editing of 3D Gaussian Splatting assets by introducing a state-aware, feedforward framework that learns to edit Gaussian attributes directly from 2D edits. It combines a compact, feature-rich 3DGS representation derived from a Gaussian LRM with a local voxel Transformer and a Test-Time Training refinement module to adapt on the fly. The approach enables precise local refinements and consistent global recoloring/relighting at interactive speeds, outperforming diffusion- and optimization-based baselines in quality and efficiency. This work paves the way for end-to-end, real-time 3D content authoring that preserves original identity while supporting fine-grained edits.

Abstract

The rise of 3D Gaussian Splatting has revolutionized photorealistic 3D asset creation, yet a critical gap remains for their interactive refinement and editing. Existing approaches based on diffusion or optimization are ill-suited for this task, as they are often prohibitively slow, destructive to the original asset's identity, or lack the precision for fine-grained control. To address this, we introduce \ourmethod, a state-aware feedforward model that enables continuous editing of 3D Gaussian assets from user-provided 2D view(s). Our method directly predicts updates to the attributes of a compact, feature-rich Gaussian representation and leverages Test-Time Training to create a state-aware, iterative workflow. The versatility of our approach allows a single architecture to perform diverse tasks, including high-fidelity local detail refinement, local paint-over, and consistent global recoloring, all at interactive speeds, paving the way for fluid and intuitive 3D content authoring.

SplatPainter: Interactive Authoring of 3D Gaussians from 2D Edits via Test-Time Training

TL;DR

SplatPainter targets the gap in interactive editing of 3D Gaussian Splatting assets by introducing a state-aware, feedforward framework that learns to edit Gaussian attributes directly from 2D edits. It combines a compact, feature-rich 3DGS representation derived from a Gaussian LRM with a local voxel Transformer and a Test-Time Training refinement module to adapt on the fly. The approach enables precise local refinements and consistent global recoloring/relighting at interactive speeds, outperforming diffusion- and optimization-based baselines in quality and efficiency. This work paves the way for end-to-end, real-time 3D content authoring that preserves original identity while supporting fine-grained edits.

Abstract

The rise of 3D Gaussian Splatting has revolutionized photorealistic 3D asset creation, yet a critical gap remains for their interactive refinement and editing. Existing approaches based on diffusion or optimization are ill-suited for this task, as they are often prohibitively slow, destructive to the original asset's identity, or lack the precision for fine-grained control. To address this, we introduce \ourmethod, a state-aware feedforward model that enables continuous editing of 3D Gaussian assets from user-provided 2D view(s). Our method directly predicts updates to the attributes of a compact, feature-rich Gaussian representation and leverages Test-Time Training to create a state-aware, iterative workflow. The versatility of our approach allows a single architecture to perform diverse tasks, including high-fidelity local detail refinement, local paint-over, and consistent global recoloring, all at interactive speeds, paving the way for fluid and intuitive 3D content authoring.

Paper Structure

This paper contains 23 sections, 9 equations, 8 figures, 3 tables.

Figures (8)

  • Figure 1: We present SplatPainter, a feedforward method to support interactive and continuous authorship of 3D Gaussian assets through intuitive 2D edits e.g. high-fidelity local detail refinement, local paint-over, and consistent global recoloring, while maintaining 3D consistency and the original texture and structure details.
  • Figure 2: Overview. Given an input 3DGS asset, the framework first performs a one-time preprocessing step. The asset is rendered from multiple views to generate feature-rich inputs from a Gaussian LRM. Stage I compresses this into a compact latent representation via a local transformer. The interactive editing loop in Stage II then iteratively refines this latent representation using new 2D user edits (New input(s)) and a Test-Time Training (TTT) module to produce the final edited 3DGS.
  • Figure 3: TTT operations. Fast weights $W$ are iteratively updated using new input views and subsequently applied to the voxel GS latents after seeing all the new inputs. The residual connections are omitted for clarity.
  • Figure 4: Qualitative evaluation for local refinement. Given a zoomed-in input view (left), we compare the refined results between baseline methods and ours from a novel view close to zoomed-in view. Our method recovers the details provided by the input zoomed-in views, and produces much sharper features compared to direct reconstruction (GS-LRM gslrm2024 or upsampling method (GenDen nam2025generative) and is on-par or better than optimization-based approaches but using a fraction of the time (see \ref{['tab:refinement']}).
  • Figure 5: Global relighting comparison. ReLitLRM takes an environment map as input to synthesize new lighting, while our method leverages a few relit views as direct input. Though not strictly relighting, our method offers a practical path for appearance transfer, achieving consistent shadows, accurate color propagation, and closer alignment with ground truth.
  • ...and 3 more figures