Multi-RoI Human Mesh Recovery with Camera Consistency and Contrastive Losses
Yongwei Nie, Changzhen Liu, Chengjiang Long, Qing Zhang, Guiqing Li, Hongmin Cai
TL;DR
The paper addresses camera–mesh entanglement in single-image Human Mesh Recovery by introducing multiple RoIs to generate local cameras and enforce a single full-image camera through a camera consistency loss $L_{cam}$ and a contrastive loss $L_{cont}$. A RoI-aware feature fusion network outputs a mesh shared by all RoIs and local cameras for each RoI, with local cameras convertible to the full camera to enable cross-RoI constraints and a latent space projection for $L_{cont}$. The method achieves state-of-the-art performance on benchmarks such as 3DPW and Human3.6M and is validated through extensive ablations showing the effectiveness of the RoI fusion, camera consistency, and contrastive components. This approach improves mesh accuracy and camera estimation in HMR and points to potential extensions to multi-view or video-based settings.
Abstract
Besides a 3D mesh, Human Mesh Recovery (HMR) methods usually need to estimate a camera for computing 2D reprojection loss. Previous approaches may encounter the following problem: both the mesh and camera are not correct but the combination of them can yield a low reprojection loss. To alleviate this problem, we define multiple RoIs (region of interest) containing the same human and propose a multiple-RoI-based HMR method. Our key idea is that with multiple RoIs as input, we can estimate multiple local cameras and have the opportunity to design and apply additional constraints between cameras to improve the accuracy of the cameras and, in turn, the accuracy of the corresponding 3D mesh. To implement this idea, we propose a RoI-aware feature fusion network by which we estimate a 3D mesh shared by all RoIs as well as local cameras corresponding to the RoIs. We observe that local cameras can be converted to the camera of the full image through which we construct a local camera consistency loss as the additional constraint imposed on local cameras. Another benefit of introducing multiple RoIs is that we can encapsulate our network into a contrastive learning framework and apply a contrastive loss to regularize the training of our network. Experiments demonstrate the effectiveness of our multi-RoI HMR method and superiority to recent prior arts. Our code is available at https://github.com/CptDiaos/Multi-RoI.
