SynthVerse: A Large-Scale Diverse Synthetic Dataset for Point Tracking
Weiguang Zhao, Haoran Xu, Xingyu Miao, Qin Zhao, Rui Zhang, Kaizhu Huang, Ning Gao, Peizhou Cao, Mingze Sun, Mulin Yu, Tao Lu, Linning Xu, Junting Dong, Jiangmiao Pang
TL;DR
SynthVerse tackles the data bottleneck in general point tracking by introducing a large-scale synthetic dataset generated via a cross-platform Blender+Isaac Sim pipeline. It offers broad domain coverage, including articulated and deformable objects, humans/animals, and embodied/humanoid interactions, with dense 3D trajectories and visibility annotations, plus a multi-domain benchmark spanning Nav, Film, Embodied, and other domains. Empirical results show that fine-tuning state-of-the-art trackers (e.g., TAPIP3D) on SynthVerse improves 3D/2D tracking performance and generalization across synthetic and real-world datasets, while also exposing limitations under domain shifts. The work demonstrates the value of synthetic diversity for robust point tracking and lays groundwork for broader sim-to-real transfer and future model benchmarking.
Abstract
Point tracking aims to follow visual points through complex motion, occlusion, and viewpoint changes, and has advanced rapidly with modern foundation models. Yet progress toward general point tracking remains constrained by limited high-quality data, as existing datasets often provide insufficient diversity and imperfect trajectory annotations. To this end, we introduce SynthVerse, a large-scale, diverse synthetic dataset specifically designed for point tracking. SynthVerse includes several new domains and object types missing from existing synthetic datasets, such as animated-film-style content, embodied manipulation, scene navigation, and articulated objects. SynthVerse substantially expands dataset diversity by covering a broader range of object categories and providing high-quality dynamic motions and interactions, enabling more robust training and evaluation for general point tracking. In addition, we establish a highly diverse point tracking benchmark to systematically evaluate state-of-the-art methods under broader domain shifts. Extensive experiments and analyses demonstrate that training with SynthVerse yields consistent improvements in generalization and reveal limitations of existing trackers under diverse settings.
