TextMesh4D: Text-to-4D Mesh Generation via Jacobian Deformation Field

Sisi Dai; Xinxin Su; Ruizhen Hu; Kai Xu

Paper

TextMesh4D: Text-to-4D Mesh Generation via Jacobian Deformation Field

Abstract

Dynamic 3D (4D) content generation, particularly text-to-4D, remains a challenging and under-explored problem due to its inherent spatiotemporal complexity. Existing text-to-4D methods typically avoid direct mesh generation due to inherent topological constraints, favoring alternative representations like NeRFs or 3DGS. However, these non-mesh approaches, suffer from insufficient geometric fidelity, temporal artifacts, and limited compatibility with modern computer graphics (CG) pipelines. In contrast, directly generating dynamic meshes faces two key challenges: i) deformation inflexibility, as traditional vertex-based optimization is constrained by meshes' explicitly encoded topology, and ii) semantic inconsistency, arising from stochastic noise in distilled priors. In this paper, we introduce TextMesh4D, a pioneering framework for text-to-4D mesh generation that directly addresses these challenges. TextMesh4D features two core innovations: 1) the Jacobian Deformation Field (JDF), which shifts the deformation unit from vertices to faces, using per-face Jacobians to model flexible transformations free from topological constraints. 2) the Local-Global Semantic Regularizer (LGSR), which leverages the mesh's innate geometric properties to enforce semantic coherence both locally and globally across frames. Extensive experiments demonstrate that TextMesh4D achieves state-of-the-art performance in temporal consistency, structural fidelity, and visual realism, while requiring only a single 24GB GPU. Our work establishes a new benchmark for efficient and high-quality text-to-4D mesh generation. The code will be released to facilitate future research.