Feature Splatting: Language-Driven Physics-Based Scene Synthesis and Editing

Ri-Zhao Qiu; Ge Yang; Weijia Zeng; Xiaolong Wang

Feature Splatting: Language-Driven Physics-Based Scene Synthesis and Editing

Ri-Zhao Qiu, Ge Yang, Weijia Zeng, Xiaolong Wang

TL;DR

Feature Splatting addresses the problem of synthesizing and editing physics-aware dynamic scenes from static 3D captures by unifying appearance, geometry, semantics, and material properties into explicit 3D Gaussians. It distills vision-language features into Gaussians, enables open-vocabulary scene decomposition, and grounds material properties for physics-based dynamics through a Taichi-based MPM, with volume-preserving and deformation-aware mechanisms. The framework supports language-driven editing of both appearance and geometry and demonstrates dynamic scene synthesis with elastic, granular, and volume-preserving interactions, aided by regularization across SAM, CLIP, and DINOv2 features. With optimized rendering and a staged training pipeline, Feature Splatting offers efficient, interpretable control for open-vocabulary, physics-informed scene editing in an explicit 3D representation.

Abstract

Scene representations using 3D Gaussian primitives have produced excellent results in modeling the appearance of static and dynamic 3D scenes. Many graphics applications, however, demand the ability to manipulate both the appearance and the physical properties of objects. We introduce Feature Splatting, an approach that unifies physics-based dynamic scene synthesis with rich semantics from vision language foundation models that are grounded by natural language. Our first contribution is a way to distill high-quality, object-centric vision-language features into 3D Gaussians, that enables semi-automatic scene decomposition using text queries. Our second contribution is a way to synthesize physics-based dynamics from an otherwise static scene using a particle-based simulator, in which material properties are assigned automatically via text queries. We ablate key techniques used in this pipeline, to illustrate the challenge and opportunities in using feature-carrying 3D Gaussians as a unified format for appearance, geometry, material properties and semantics grounded on natural language. Project website: https://feature-splatting.github.io/

Feature Splatting: Language-Driven Physics-Based Scene Synthesis and Editing

TL;DR

Abstract

Paper Structure (28 sections, 6 equations, 8 figures, 4 tables)

This paper contains 28 sections, 6 equations, 8 figures, 4 tables.

Introduction
Related Work
Novel View Synthesis.
Scene Editing with Distilled Feature Fields.
Concurrent Work.
Language-Driven Physics-Based Synthesis and Editing
Differentiable Feature Splatting
Feature Splatting.
Systems Considerations.
Improving Reference Feature Quality Using Part-Priors.
Language-guided Scene Decomposition
Basic Editing Primitives.
Language-Driven Physics Synthesis.
Decoupling Objects for Simulation.
Language-grounded Collision Surface Estimation.
...and 13 more sections

Figures (8)

Figure 1: Feature Splatting. An overview of the language-grounded scene physics editing pipeline. Given input images, feature splatting optimizes for a unified Gaussian representation that contains the geometry, texture, and semantics of the scene using features from large-scale 2D vision models radford2021-CLIPoquab2023-dinov2kirillov2023-segmentanything. With open-vocabulary scene decomposition, feature splatting segments an object and automatically determines the physical properties of components within the object. In this example, a user gives a query 'a vase with flowers'. Feature splatting extracts the vase with flowers in the scene, and further decomposes it into rigid and non-rigid parts, creating a dynamic scene of flowers swaying in the wind. (Best viewed in videos on project website).
Figure 2: Feature Splatting. Raw CLIP features are noisy and low-resolution. We improve the quality of the feature maps by pooling within part-level masks produced by the Segment Anything Model (SAM kirillov2023-segmentanything). Jointly modeling features from DINOv2 oquab2023-dinov2 and CLIP is an optional regularization that offers minor additional improvements.
Figure 3: Raw and Enhanced Feature Maps. CLIP features contain view-dependent noise that degrades the feature splats radford2021-CLIP. We mask-pool with masks produced by SAM kirillov2023-segmentanything, and regularization through joint modeling of DINOv2 features oquab2023-dinov2 to improve its quality. Color corresponds to top three PCA vectors.
Figure 4: Physics-based Dynamic Scene Synthesis. Rich semantic features in Feature Splatting enable semi-automatic assignment of material properties for synthesizing dynamic scenes from a single static 3D capture. We can use simple text queries to manipulate the physical property of specific objects and materials. From top to bottom: changing the elasticity, turning solid into granular material, modeling volume-dependent deformation in a falling volleyball. For the best illustration with animations and moving cameras, please refer to videos on the project website.
Figure 5: Feature Splatting Editing Primitives.: We can remove, scale, rotate, translate, and clone objects in the scene using language.
...and 3 more figures

Feature Splatting: Language-Driven Physics-Based Scene Synthesis and Editing

TL;DR

Abstract

Feature Splatting: Language-Driven Physics-Based Scene Synthesis and Editing

Authors

TL;DR

Abstract

Table of Contents

Figures (8)