What Makes Good Synthetic Training Data for Zero-Shot Stereo Matching?
David Yan, Alexander Raistrick, Jia Deng
TL;DR
The paper tackles the challenge of understanding what makes synthetic data effective for zero-shot stereo matching. It adopts a procedural data-generation approach built on Infinigen to perform a thorough parameter study, identifying factors such as floating-object density, background realism, material diversity, lighting, and baseline variation that influence zero-shot performance. The authors construct WMGStereo-150k (163,666 pairs) using the best parameters, demonstrating superior zero-shot generalization over many existing datasets and competitive results with FoundationStereo, while also showing strong sample efficiency. By open-sourcing the generation code and providing a detailed parameter analysis, the work offers a practical framework for designing future synthetic stereo datasets and advancing zero-shot depth learning.
Abstract
Synthetic datasets are a crucial ingredient for training stereo matching networks, but the question of what makes a stereo dataset effective remains underexplored. We investigate the design space of synthetic datasets by varying the parameters of a procedural dataset generator, and report the effects on zero-shot stereo matching performance using standard benchmarks. We validate our findings by collecting the best settings and creating a large-scale dataset. Training only on this dataset achieves better performance than training on a mixture of widely used datasets, and is competitive with training on the FoundationStereo dataset, with the additional benefit of open-source generation code and an accompanying parameter analysis to enable further research. We open-source our system at https://github.com/princeton-vl/InfinigenStereo to enable further research on procedural stereo datasets.
