Table of Contents
Fetching ...

Real-time High-fidelity Gaussian Human Avatars with Position-based Interpolation of Spatially Distributed MLPs

Youyi Zhan, Tianjia Shao, Yin Yang, Kun Zhou

TL;DR

This work tackles real-time, high-fidelity modeling of pose-dependent human appearance. It introduces Gaussian Avatars with spatially distributed MLPs: anchor MLPs output coefficients that are interpolated to per-Gaussian properties via a learned offset basis, enabling high-frequency detail without the bottleneck of large CNNs. A surface-constrained, control-point scheme ensures Gaussians stay on a moving surface rather than drifting inside the body, improving generalization under novel poses. Evaluated on multi-view datasets, the method achieves state-of-the-art quality with significantly faster rendering (≈166 fps) than prior approaches, while maintaining robustness across views and poses. Limitations include clothing dynamics not being simulated and reliance on a multi-view pipeline for reconstruction, suggesting directions for future work in cloth modeling and monocular setups.

Abstract

Many works have succeeded in reconstructing Gaussian human avatars from multi-view videos. However, they either struggle to capture pose-dependent appearance details with a single MLP, or rely on a computationally intensive neural network to reconstruct high-fidelity appearance but with rendering performance degraded to non-real-time. We propose a novel Gaussian human avatar representation that can reconstruct high-fidelity pose-dependence appearance with details and meanwhile can be rendered in real time. Our Gaussian avatar is empowered by spatially distributed MLPs which are explicitly located on different positions on human body. The parameters stored in each Gaussian are obtained by interpolating from the outputs of its nearby MLPs based on their distances. To avoid undesired smooth Gaussian property changing during interpolation, for each Gaussian we define a set of Gaussian offset basis, and a linear combination of basis represents the Gaussian property offsets relative to the neutral properties. Then we propose to let the MLPs output a set of coefficients corresponding to the basis. In this way, although Gaussian coefficients are derived from interpolation and change smoothly, the Gaussian offset basis is learned freely without constraints. The smoothly varying coefficients combined with freely learned basis can still produce distinctly different Gaussian property offsets, allowing the ability to learn high-frequency spatial signals. We further use control points to constrain the Gaussians distributed on a surface layer rather than allowing them to be irregularly distributed inside the body, to help the human avatar generalize better when animated under novel poses. Compared to the state-of-the-art method, our method achieves better appearance quality with finer details while the rendering speed is significantly faster under novel views and novel poses.

Real-time High-fidelity Gaussian Human Avatars with Position-based Interpolation of Spatially Distributed MLPs

TL;DR

This work tackles real-time, high-fidelity modeling of pose-dependent human appearance. It introduces Gaussian Avatars with spatially distributed MLPs: anchor MLPs output coefficients that are interpolated to per-Gaussian properties via a learned offset basis, enabling high-frequency detail without the bottleneck of large CNNs. A surface-constrained, control-point scheme ensures Gaussians stay on a moving surface rather than drifting inside the body, improving generalization under novel poses. Evaluated on multi-view datasets, the method achieves state-of-the-art quality with significantly faster rendering (≈166 fps) than prior approaches, while maintaining robustness across views and poses. Limitations include clothing dynamics not being simulated and reliance on a multi-view pipeline for reconstruction, suggesting directions for future work in cloth modeling and monocular setups.

Abstract

Many works have succeeded in reconstructing Gaussian human avatars from multi-view videos. However, they either struggle to capture pose-dependent appearance details with a single MLP, or rely on a computationally intensive neural network to reconstruct high-fidelity appearance but with rendering performance degraded to non-real-time. We propose a novel Gaussian human avatar representation that can reconstruct high-fidelity pose-dependence appearance with details and meanwhile can be rendered in real time. Our Gaussian avatar is empowered by spatially distributed MLPs which are explicitly located on different positions on human body. The parameters stored in each Gaussian are obtained by interpolating from the outputs of its nearby MLPs based on their distances. To avoid undesired smooth Gaussian property changing during interpolation, for each Gaussian we define a set of Gaussian offset basis, and a linear combination of basis represents the Gaussian property offsets relative to the neutral properties. Then we propose to let the MLPs output a set of coefficients corresponding to the basis. In this way, although Gaussian coefficients are derived from interpolation and change smoothly, the Gaussian offset basis is learned freely without constraints. The smoothly varying coefficients combined with freely learned basis can still produce distinctly different Gaussian property offsets, allowing the ability to learn high-frequency spatial signals. We further use control points to constrain the Gaussians distributed on a surface layer rather than allowing them to be irregularly distributed inside the body, to help the human avatar generalize better when animated under novel poses. Compared to the state-of-the-art method, our method achieves better appearance quality with finer details while the rendering speed is significantly faster under novel views and novel poses.

Paper Structure

This paper contains 15 sections, 8 equations, 10 figures, 8 tables.

Figures (10)

  • Figure 1: Qualitative comparison of different number of Guassians.
  • Figure 2: Pipeline overview. (a) We define the spatially distributed MLPs on anchor points, which are uniformly sampled on the template mesh. Each MLP takes the pose $\bm{\theta}$ as input and outputs the anchor coefficients $\mathbf{w}_a$. (b) The Gaussian coefficients $\mathbf{w}_g$ are interpolated from the coefficients of three nearest anchor points. (c) The Gaussian property offsets are obtained by linearly combining Gaussian offset basis using Gaussian coefficients. Then the neutral Gaussian properties are added with Gaussian property offsets to model the human appearance under pose $\bm{\theta}$. Finally the Gaussians are transformed to the pose $\bm{\theta}$ and rasterized to produce high-fidelity images. Note that the Gaussian position offset $\delta \mathbf{x}$ is obtained through control point interpolation, which is illustrated in \ref{['sec:anchorpoints']}.
  • Figure 2: Ablation study on PCA components.
  • Figure 3: Illustration of the control point. The Gaussian position offset $\delta \mathbf{x}$ is interpolated from the position offsets of nearby control points $\delta \mathbf{x}_c$.
  • Figure 4: Qualitative comparison with the state-of-the-art methods on training pose reconstruction (top two subjects) and novel pose synthesis (bottom subject).
  • ...and 5 more figures