Table of Contents
Fetching ...

TrackerSplat: Exploiting Point Tracking for Fast and Robust Dynamic 3D Gaussians Reconstruction

Daheng Yin, Isaac Ding, Yili Jin, Jianxin Shi, Jiangchuan Liu

Abstract

Recent advancements in 3D Gaussian Splatting (3DGS) have demonstrated its potential for efficient and photorealistic 3D reconstructions, which is crucial for diverse applications such as robotics and immersive media. However, current Gaussian-based methods for dynamic scene reconstruction struggle with large inter-frame displacements, leading to artifacts and temporal inconsistencies under fast object motions. To address this, we introduce \textit{TrackerSplat}, a novel method that integrates advanced point tracking methods to enhance the robustness and scalability of 3DGS for dynamic scene reconstruction. TrackerSplat utilizes off-the-shelf point tracking models to extract pixel trajectories and triangulate per-view pixel trajectories onto 3D Gaussians to guide the relocation, rotation, and scaling of Gaussians before training. This strategy effectively handles large displacements between frames, dramatically reducing the fading and recoloring artifacts prevalent in prior methods. By accurately positioning Gaussians prior to gradient-based optimization, TrackerSplat overcomes the quality degradation associated with large frame gaps when processing multiple adjacent frames in parallel across multiple devices, thereby boosting reconstruction throughput while preserving rendering quality. Experiments on real-world datasets confirm the robustness of TrackerSplat in challenging scenarios with significant displacements, achieving superior throughput under parallel settings and maintaining visual quality compared to baselines. The code is available at https://github.com/yindaheng98/TrackerSplat.

TrackerSplat: Exploiting Point Tracking for Fast and Robust Dynamic 3D Gaussians Reconstruction

Abstract

Recent advancements in 3D Gaussian Splatting (3DGS) have demonstrated its potential for efficient and photorealistic 3D reconstructions, which is crucial for diverse applications such as robotics and immersive media. However, current Gaussian-based methods for dynamic scene reconstruction struggle with large inter-frame displacements, leading to artifacts and temporal inconsistencies under fast object motions. To address this, we introduce \textit{TrackerSplat}, a novel method that integrates advanced point tracking methods to enhance the robustness and scalability of 3DGS for dynamic scene reconstruction. TrackerSplat utilizes off-the-shelf point tracking models to extract pixel trajectories and triangulate per-view pixel trajectories onto 3D Gaussians to guide the relocation, rotation, and scaling of Gaussians before training. This strategy effectively handles large displacements between frames, dramatically reducing the fading and recoloring artifacts prevalent in prior methods. By accurately positioning Gaussians prior to gradient-based optimization, TrackerSplat overcomes the quality degradation associated with large frame gaps when processing multiple adjacent frames in parallel across multiple devices, thereby boosting reconstruction throughput while preserving rendering quality. Experiments on real-world datasets confirm the robustness of TrackerSplat in challenging scenarios with significant displacements, achieving superior throughput under parallel settings and maintaining visual quality compared to baselines. The code is available at https://github.com/yindaheng98/TrackerSplat.

Paper Structure

This paper contains 55 sections, 6 equations, 5 figures, 2 tables.

Figures (5)

  • Figure 1: DOT point tracking on a video sequence. Colored lines show pixel trajectories over time.
  • Figure 2: TrackerSplat overview. Our method processes video clips captured from multiple fixed viewpoints. It begins by applying existing reconstruction techniques to initialize a set of 3D Gaussians for the first frame. For subsequent frames, the position, rotation, and scale of these Gaussians are updated based on point tracking across views, with their motions regularized by neighboring Gaussians. Finally, the Gaussian parameters of each frame are refined by training on input frames.
  • Figure 3: TrackerSplat parallel pipeline.
  • Figure 4: Average visual quality (PSNR$\uparrow$ / SSIM$\uparrow$ / LPIPs$\downarrow$) over long-video sequences using our parallel pipeline with 8 GPUs (long-video experiments). Our method achieves higher and more stable visual quality than baselines in most cases, demonstrating its robustness. Lines ending prematurely for 4DGS and ST-4DGS indicate training failures due to GPU memory overflow (exceeding the 40GB limit of the A100 GPU) or numerical instabilities (NaN gradients). Corresponding rendered videos are provided in the supplementary material.
  • Figure 5: Qualitative comparison of rendered results from the final frame of representative 9-frame clips processed in parallel using 8 GPUs (short-clip experiments). Our method generates fewer artifacts and better preserves visual details compared to baselines, particularly in highly dynamic regions.