Table of Contents
Fetching ...

HFGaussian: Learning Generalizable Gaussian Human with Integrated Human Features

Arnab Dey, Cheng-You Lu, Andrew I. Comport, Srinath Sridhar, Chin-Teng Lin, Jean Martinet

TL;DR

A novel approach called HFGaussian is presented that can estimate novel views and human features, such as the 3D skeleton, 3D key points, and dense pose, from sparse input images in real time at 25 FPS, enabling efficient and generalizable reconstruction.

Abstract

Recent advancements in radiance field rendering show promising results in 3D scene representation, where Gaussian splatting-based techniques emerge as state-of-the-art due to their quality and efficiency. Gaussian splatting is widely used for various applications, including 3D human representation. However, previous 3D Gaussian splatting methods either use parametric body models as additional information or fail to provide any underlying structure, like human biomechanical features, which are essential for different applications. In this paper, we present a novel approach called HFGaussian that can estimate novel views and human features, such as the 3D skeleton, 3D key points, and dense pose, from sparse input images in real time at 25 FPS. The proposed method leverages generalizable Gaussian splatting technique to represent the human subject and its associated features, enabling efficient and generalizable reconstruction. By incorporating a pose regression network and the feature splatting technique with Gaussian splatting, HFGaussian demonstrates improved capabilities over existing 3D human methods, showcasing the potential of 3D human representations with integrated biomechanics. We thoroughly evaluate our HFGaussian method against the latest state-of-the-art techniques in human Gaussian splatting and pose estimation, demonstrating its real-time, state-of-the-art performance.

HFGaussian: Learning Generalizable Gaussian Human with Integrated Human Features

TL;DR

A novel approach called HFGaussian is presented that can estimate novel views and human features, such as the 3D skeleton, 3D key points, and dense pose, from sparse input images in real time at 25 FPS, enabling efficient and generalizable reconstruction.

Abstract

Recent advancements in radiance field rendering show promising results in 3D scene representation, where Gaussian splatting-based techniques emerge as state-of-the-art due to their quality and efficiency. Gaussian splatting is widely used for various applications, including 3D human representation. However, previous 3D Gaussian splatting methods either use parametric body models as additional information or fail to provide any underlying structure, like human biomechanical features, which are essential for different applications. In this paper, we present a novel approach called HFGaussian that can estimate novel views and human features, such as the 3D skeleton, 3D key points, and dense pose, from sparse input images in real time at 25 FPS. The proposed method leverages generalizable Gaussian splatting technique to represent the human subject and its associated features, enabling efficient and generalizable reconstruction. By incorporating a pose regression network and the feature splatting technique with Gaussian splatting, HFGaussian demonstrates improved capabilities over existing 3D human methods, showcasing the potential of 3D human representations with integrated biomechanics. We thoroughly evaluate our HFGaussian method against the latest state-of-the-art techniques in human Gaussian splatting and pose estimation, demonstrating its real-time, state-of-the-art performance.

Paper Structure

This paper contains 26 sections, 10 equations, 11 figures, 5 tables.

Figures (11)

  • Figure 1: The HFGaussian pipeline: Given a target view, the nearest source views $I_l$ and $I_r$ are selected, and passed through an image encoder $\epsilon_{img}$ to generate feature maps $f_l^s$ and $f_r^s$ for depth maps $D_l$ and $D_r$ estimation. The depth maps are then encoded using a $\epsilon_{depth}$ encoder and combined with the image features before passing through a U-Net based decoder $\epsilon_{params}$ to predict Gaussian feature maps $\mathcal{M}_r$, $\mathcal{M}_s$, $\mathcal{M}_\alpha$, and $\mathcal{M}_f$. Finally, the predicted Gaussians are splatted and rasterized to generate the novel view and human features, which are further processed by a smaller MLP $\epsilon_{feature}$ to obtain the final human features.
  • Figure 2: Pose Regression Network Overview: The network takes point clouds generated from depth maps as input and outputs 3D poses. We compare three point cloud classification backbones and propose a novel architecture combining PointNet and DGCNN architecture for robust feature extraction.
  • Figure 3: Qualitative comparison of novel view synthesis results on THuman2.0 test set.
  • Figure 4: Qualitative result of the proposed method of real human dataset. The figure illustrates random frames from real human data and their corresponding novel view, dense pose, and 3D pose predicted by the HFGaussian. The 3D pose images are generated by projecting the 3D pose in a 2D plane from a fixed viewing angle.
  • Figure 5: The figure illustrates the performance and average inference time of different 3D pose estimation backbone architectures. The 3D pose estimation performance is measured in terms of MPJPE, which is plotted on the y-axis. The x-axis represents the average inference time of the respective backbone models.
  • ...and 6 more figures