Learning to Estimate Single-View Volumetric Flow Motions without 3D Supervision
Aleksandra Franz, Barbara Solenthaler, Nils Thuerey
TL;DR
This work tackles monocular estimation of 3D volumetric fluid motion by learning a global 3D velocity field $\mathbf{u}$ and density $\rho$ without 3D ground-truth supervision. It introduces Neural Global Transport (NGT), which combines a 2D-to-3D density estimator, a multi-scale curl-based velocity generator, differentiable transport, differentiable rendering, and an adversarial prior to resolve depth ambiguity from a single view. The method demonstrates stable long-term predictions and competitive realism on synthetic plumes and real ScalarFlow data, while offering an end-to-end, single-pass alternative to costly optimization-based reconstructions. The results indicate strong potential for real-world monocular fluid reconstruction, with limitations noted for isotropic scattering and obstacle-enabled transport, suggesting clear directions for future extension.
Abstract
We address the challenging problem of jointly inferring the 3D flow and volumetric densities moving in a fluid from a monocular input video with a deep neural network. Despite the complexity of this task, we show that it is possible to train the corresponding networks without requiring any 3D ground truth for training. In the absence of ground truth data we can train our model with observations from real-world capture setups instead of relying on synthetic reconstructions. We make this unsupervised training approach possible by first generating an initial prototype volume which is then moved and transported over time without the need for volumetric supervision. Our approach relies purely on image-based losses, an adversarial discriminator network, and regularization. Our method can estimate long-term sequences in a stable manner, while achieving closely matching targets for inputs such as rising smoke plumes.
