Understanding Particles From Video: Property Estimation of Granular Materials via Visuo-Haptic Learning
Zeqing Zhang, Guangze Zheng, Xuebo Ji, Guanqi Chen, Ruixing Jia, Wentao Chen, Guanhua Chen, Liangjun Zhang, Jia Pan
TL;DR
This work tackles estimating relative granular material properties, specifically particle size and density, from video alone by leveraging a visuo-haptic learning framework inspired by a probe-dragging contact model. An encoder–decoder network, trained with force signals as supervision and aided by a particle-tracking preprocessing step, maps granule motion in videos to latent embeddings that implicitly encode size and density distributions, enabling single-modality inference at inference time. The approach is validated on the GM15-VF dataset and extended to real-world beach sands and handheld video collection, with ablation studies demonstrating the importance of tracking and the interpretability of the latent space. Key findings include robust force inference on unseen GMs, a 4D latent space that captures relative GM properties, and practical generalization to non-lab data, though moisture effects can limit applicability.
Abstract
Granular materials (GMs) are ubiquitous in daily life. Understanding their properties is also important, especially in agriculture and industry. However, existing works require dedicated measurement equipment and also need large human efforts to handle a large number of particles. In this paper, we introduce a method for estimating the relative values of particle size and density from the video of the interaction with GMs. It is trained on a visuo-haptic learning framework inspired by a contact model, which reveals the strong correlation between GM properties and the visual-haptic data during the probe-dragging in the GMs. After training, the network can map the visual modality well to the haptic signal and implicitly characterize the relative distribution of particle properties in its latent embeddings, as interpreted in that contact model. Therefore, we can analyze GM properties using the trained encoder, and only visual information is needed without extra sensory modalities and human efforts for labeling. The presented GM property estimator has been extensively validated via comparison and ablation experiments. The generalization capability has also been evaluated and a real-world application on the beach is also demonstrated. Experiment videos are available at \url{https://sites.google.com/view/gmwork/vhlearning} .
