TRASE: Tracking-free 4D Segmentation and Editing
Yun-Jin Li, Mariia Gladkova, Yan Xia, Daniel Cremers
TL;DR
TRASE addresses dynamic scene understanding by learning a tracking-free 4D semantic field. It combines dynamic geometry reconstruction with a $32$-dimensional Gaussian feature learning guided by 2D SAM masks through a soft-mined contrastive objective, followed by DBSCAN clustering to yield temporally and spatially consistent object segments. The approach enables interactive editing tasks such as object removal, scene composition, and style transfer directly in 3D, and achieves state-of-the-art segmentation across five dynamic benchmarks with robust novel-view generalization. Overall, TRASE offers a principled, efficient framework for dynamic scene segmentation and editing that scales to multi-view data and real-time interaction.
Abstract
Understanding dynamic 3D scenes is crucial for extended reality (XR) and autonomous driving. Incorporating semantic information into 3D reconstruction enables holistic scene representations, unlocking immersive and interactive applications. To this end, we introduce TRASE, a novel tracking-free 4D segmentation method for dynamic scene understanding. TRASE learns a 4D segmentation feature field in a weakly-supervised manner, leveraging a soft-mined contrastive learning objective guided by SAM masks. The resulting feature space is semantically coherent and well-separated, and final object-level segmentation is obtained via unsupervised clustering. This enables fast editing, such as object removal, composition, and style transfer, by directly manipulating the scene's Gaussians. We evaluate TRASE on five dynamic benchmarks, demonstrating state-of-the-art segmentation performance from unseen viewpoints and its effectiveness across various interactive editing tasks. Our project page is available at: https://yunjinli.github.io/project-sadg/
