TwoSquared: 4D Generation from 2D Image Pairs
Lu Sang, Zehranaz Canfes, Dongliang Cao, Riccardo Marin, Florian Bernard, Daniel Cremers
TL;DR
TwoSquared addresses 4D generation from only two 2D frames by splitting the task into an image-to-3D generation step for endpoints and a physically grounded velocity-field deformation to interpolate. It combines a flexible 3D generation backbone with a Vertex Registration module that establishes robust correspondences via a functional map, and a Shape Deformation module that optimizes a velocity field under physical constraints to produce a continuous 4D sequence with texture- and geometry-consistency. The method is template-free, works on in-the-wild inputs, supports arbitrary frame rates without retraining, and demonstrates superior geometry and texture quality on 4D-DRESS and web-image scenarios. This work enables practical, controllable, minimal-input 4D generation and has potential to augment Generative AI pipelines with dynamic, physically plausible content.
Abstract
Despite the astonishing progress in generative AI, 4D dynamic object generation remains an open challenge. With limited high-quality training data and heavy computing requirements, the combination of hallucinating unseen geometry together with unseen movement poses great challenges to generative models. In this work, we propose TwoSquared as a method to obtain a 4D physically plausible sequence starting from only two 2D RGB images corresponding to the beginning and end of the action. Instead of directly solving the 4D generation problem, TwoSquared decomposes the problem into two steps: 1) an image-to-3D module generation based on the existing generative model trained on high-quality 3D assets, and 2) a physically inspired deformation module to predict intermediate movements. To this end, our method does not require templates or object-class-specific prior knowledge and can take in-the-wild images as input. In our experiments, we demonstrate that TwoSquared is capable of producing texture-consistent and geometry-consistent 4D sequences only given 2D images.
