Table of Contents
Fetching ...

LoopGaussian: Creating 3D Cinemagraph with Multi-view Images via Eulerian Motion Field

Jiyang Li, Lechao Cheng, Zhangye Wang, Tingting Mu, Jingxuan He

TL;DR

LoopGaussian addresses the challenge of generating authentic, loopable 3D cinemagraphs from multi-view static scenes by leveraging 3D Gaussian Splatting with an Eulerian motion field. The method projects Gaussians into a learned feature space, clusters them with SuperGaussian to exploit local self-similarity, derives a sparse-to-dense velocity field via Kriging and an MLP refinement, and produces loopable 3D motion with bidirectional animation. Key contributions include eccentricity-based shape regularization for artifact-free representations, a two-stage Eulerian motion estimation that does not rely on large pretraining, and the ability to render from novel viewpoints with realistic deformations of soft objects. The approach achieves superior perceptual quality and quantitative metrics over 2D baselines and demonstrates practical impact for high-fidelity, view-consistent cinemagraphs in 3D space.

Abstract

Cinemagraph is a unique form of visual media that combines elements of still photography and subtle motion to create a captivating experience. However, the majority of videos generated by recent works lack depth information and are confined to the constraints of 2D image space. In this paper, inspired by significant progress in the field of novel view synthesis (NVS) achieved by 3D Gaussian Splatting (3D-GS), we propose LoopGaussian to elevate cinemagraph from 2D image space to 3D space using 3D Gaussian modeling. To achieve this, we first employ the 3D-GS method to reconstruct 3D Gaussian point clouds from multi-view images of static scenes,incorporating shape regularization terms to prevent blurring or artifacts caused by object deformation. We then adopt an autoencoder tailored for 3D Gaussian to project it into feature space. To maintain the local continuity of the scene, we devise SuperGaussian for clustering based on the acquired features. By calculating the similarity between clusters and employing a two-stage estimation method, we derive an Eulerian motion field to describe velocities across the entire scene. The 3D Gaussian points then move within the estimated Eulerian motion field. Through bidirectional animation techniques, we ultimately generate a 3D Cinemagraph that exhibits natural and seamlessly loopable dynamics. Experiment results validate the effectiveness of our approach, demonstrating high-quality and visually appealing scene generation. The project is available at https://pokerlishao.github.io/LoopGaussian/.

LoopGaussian: Creating 3D Cinemagraph with Multi-view Images via Eulerian Motion Field

TL;DR

LoopGaussian addresses the challenge of generating authentic, loopable 3D cinemagraphs from multi-view static scenes by leveraging 3D Gaussian Splatting with an Eulerian motion field. The method projects Gaussians into a learned feature space, clusters them with SuperGaussian to exploit local self-similarity, derives a sparse-to-dense velocity field via Kriging and an MLP refinement, and produces loopable 3D motion with bidirectional animation. Key contributions include eccentricity-based shape regularization for artifact-free representations, a two-stage Eulerian motion estimation that does not rely on large pretraining, and the ability to render from novel viewpoints with realistic deformations of soft objects. The approach achieves superior perceptual quality and quantitative metrics over 2D baselines and demonstrates practical impact for high-fidelity, view-consistent cinemagraphs in 3D space.

Abstract

Cinemagraph is a unique form of visual media that combines elements of still photography and subtle motion to create a captivating experience. However, the majority of videos generated by recent works lack depth information and are confined to the constraints of 2D image space. In this paper, inspired by significant progress in the field of novel view synthesis (NVS) achieved by 3D Gaussian Splatting (3D-GS), we propose LoopGaussian to elevate cinemagraph from 2D image space to 3D space using 3D Gaussian modeling. To achieve this, we first employ the 3D-GS method to reconstruct 3D Gaussian point clouds from multi-view images of static scenes,incorporating shape regularization terms to prevent blurring or artifacts caused by object deformation. We then adopt an autoencoder tailored for 3D Gaussian to project it into feature space. To maintain the local continuity of the scene, we devise SuperGaussian for clustering based on the acquired features. By calculating the similarity between clusters and employing a two-stage estimation method, we derive an Eulerian motion field to describe velocities across the entire scene. The 3D Gaussian points then move within the estimated Eulerian motion field. Through bidirectional animation techniques, we ultimately generate a 3D Cinemagraph that exhibits natural and seamlessly loopable dynamics. Experiment results validate the effectiveness of our approach, demonstrating high-quality and visually appealing scene generation. The project is available at https://pokerlishao.github.io/LoopGaussian/.
Paper Structure (19 sections, 13 equations, 6 figures, 3 tables)

This paper contains 19 sections, 13 equations, 6 figures, 3 tables.

Figures (6)

  • Figure 1: Overview of our framework. Given multi-view images of a static scene, we initially create a 3D Gaussian point cloud using 3D-GS with an eccentricity regularization term. Next, we identify the point cloud region that the user wishes to deform using a 2D Mask. The 3D Gaussians are then projected into the feature space via an autoencoder and undergo clustering using SuperGaussian. Subsequently, we derive a sparse velocity field based on self-similarity, interpolate to acquire a dense velocity field and refine the final Eulerian motion field through an MLP. Finally, we can generate a seamlessly loopable video by leveraging bidirectional animation techniques in 3D space and incorporating specified camera parameters.
  • Figure 2: Comparison of visual results. From top to bottom, each column contains multiple key frames extracted from videos, and each screenshot accompanied by zoomed-in details. At the bottom, there is a visualization of the average optical flow map for the corresponding video, employing various colors to denote different motion directions. (a) is our method, and (b) is the method proposed by li20233d.
  • Figure 3: Comparison of whether to use eccentricity regularization. The use of the regularization term can significantly reduce the occurrence of burrs in the scene.
  • Figure 4: Comparison of different interpolation methods. We compare the dense velocity fields obtained without interpolation (a), with RBF interpolation (b), and with Kriging interpolation methods, respectively.
  • Figure 5: Clustering results at various voxel resolutions. Distinct colors indicate different clusters. We aim to ensure that each individual object (e.g., a leaf) is encompassed within a single cluster (middle), rather than having multiple objects grouped into one cluster (left) or a single object fragmented across multiple clusters (right).
  • ...and 1 more figures