The Sound of Water: Inferring Physical Properties from Pouring Liquids
Piyush Bagad, Makarand Tapaswi, Cees G. M. Snoek, Andrew Zisserman
TL;DR
The work investigates inferring static and dynamic physical properties from the sound of pouring liquids. It builds a physics-informed, two-stage system that first detects fundamental wavelength (pitch) from audio and then recovers properties like air-column length, container dimensions, flow rate, and time-to-fill, using synthetic data pre-training and visual co-supervision for real data. A new large pouring dataset, The Sound of Water 50, enables controlled study and cross-domain generalization, with strong results showing accurate pitch estimation and property recovery, plus shape classification and liquid-weight estimation without direct supervision from real pitch labels. The findings advance multisensory perception in robotics and demonstrate practical applicability to in-the-wild videos, while highlighting generalization limits and avenues for future physics-based audiovisual learning.
Abstract
We study the connection between audio-visual observations and the underlying physics of a mundane yet intriguing everyday activity: pouring liquids. Given only the sound of liquid pouring into a container, our objective is to automatically infer physical properties such as the liquid level, the shape and size of the container, the pouring rate and the time to fill. To this end, we: (i) show in theory that these properties can be determined from the fundamental frequency (pitch); (ii) train a pitch detection model with supervision from simulated data and visual data with a physics-inspired objective; (iii) introduce a new large dataset of real pouring videos for a systematic study; (iv) show that the trained model can indeed infer these physical properties for real data; and finally, (v) we demonstrate strong generalization to various container shapes, other datasets, and in-the-wild YouTube videos. Our work presents a keen understanding of a narrow yet rich problem at the intersection of acoustics, physics, and learning. It opens up applications to enhance multisensory perception in robotic pouring.
