Table of Contents
Fetching ...

Sonicmesh: Enhancing 3D Human Mesh Reconstruction in Vision-Impaired Environments With Acoustic Signals

Xiaoxuan Liang, Wuyang Zhang, Hong Zhou, Zhaolong Wei, Sicheng Zhu, Yansong Li, Rui Yin, Jiantao Yuan, Jeremy Gummeson

TL;DR

SonicMesh tackles the robustness gap in 3D human mesh reconstruction under challenging visual conditions by fusing acoustic signals with RGB images. It introduces a Registration Module to align 2D feature embeddings with 3D SMPL-X representations and modifies the HRNet backbone to better extract features from low-resolution acoustic images, supplemented by a global-local transformer fusion with modality masking. The main contributions are the first end-to-end acoustic-RGB HMR system without predefined features, and the combination of a 2D-3D alignment strategy with an enhanced acoustic feature extractor and dynamic multi-modal fusion. Results show SonicMesh maintains accuracy in occlusion, non-line-of-sight, and poor lighting scenarios, while using commodity devices, enabling privacy-preserving, widespread deployment.

Abstract

3D Human Mesh Reconstruction (HMR) from 2D RGB images faces challenges in environments with poor lighting, privacy concerns, or occlusions. These weaknesses of RGB imaging can be complemented by acoustic signals, which are widely available, easy to deploy, and capable of penetrating obstacles. However, no existing methods effectively combine acoustic signals with RGB data for robust 3D HMR. The primary challenges include the low-resolution images generated by acoustic signals and the lack of dedicated processing backbones. We introduce SonicMesh, a novel approach combining acoustic signals with RGB images to reconstruct 3D human mesh. To address the challenges of low resolution and the absence of dedicated processing backbones in images generated by acoustic signals, we modify an existing method, HRNet, for effective feature extraction. We also integrate a universal feature embedding technique to enhance the precision of cross-dimensional feature alignment, enabling SonicMesh to achieve high accuracy. Experimental results demonstrate that SonicMesh accurately reconstructs 3D human mesh in challenging environments such as occlusions, non-line-of-sight scenarios, and poor lighting.

Sonicmesh: Enhancing 3D Human Mesh Reconstruction in Vision-Impaired Environments With Acoustic Signals

TL;DR

SonicMesh tackles the robustness gap in 3D human mesh reconstruction under challenging visual conditions by fusing acoustic signals with RGB images. It introduces a Registration Module to align 2D feature embeddings with 3D SMPL-X representations and modifies the HRNet backbone to better extract features from low-resolution acoustic images, supplemented by a global-local transformer fusion with modality masking. The main contributions are the first end-to-end acoustic-RGB HMR system without predefined features, and the combination of a 2D-3D alignment strategy with an enhanced acoustic feature extractor and dynamic multi-modal fusion. Results show SonicMesh maintains accuracy in occlusion, non-line-of-sight, and poor lighting scenarios, while using commodity devices, enabling privacy-preserving, widespread deployment.

Abstract

3D Human Mesh Reconstruction (HMR) from 2D RGB images faces challenges in environments with poor lighting, privacy concerns, or occlusions. These weaknesses of RGB imaging can be complemented by acoustic signals, which are widely available, easy to deploy, and capable of penetrating obstacles. However, no existing methods effectively combine acoustic signals with RGB data for robust 3D HMR. The primary challenges include the low-resolution images generated by acoustic signals and the lack of dedicated processing backbones. We introduce SonicMesh, a novel approach combining acoustic signals with RGB images to reconstruct 3D human mesh. To address the challenges of low resolution and the absence of dedicated processing backbones in images generated by acoustic signals, we modify an existing method, HRNet, for effective feature extraction. We also integrate a universal feature embedding technique to enhance the precision of cross-dimensional feature alignment, enabling SonicMesh to achieve high accuracy. Experimental results demonstrate that SonicMesh accurately reconstructs 3D human mesh in challenging environments such as occlusions, non-line-of-sight scenarios, and poor lighting.

Paper Structure

This paper contains 39 sections, 13 equations, 8 figures, 2 tables.

Figures (8)

  • Figure 1: The architecture of SonicMesh.
  • Figure 2: The illustration of Registration Module.
  • Figure 3: Qualitative results (compared with Deformer, ImmFusion, and WiMesh in common, NLOS, low visibility, and smoke scenes).
  • Figure 4: Influence of various factors.
  • Figure 5: The target's actual motion can be decomposed into three steps: (a) translation motion, (b) circular motion, and (c) rotation motion.
  • ...and 3 more figures