Incorporating Test-Time Optimization into Training with Dual Networks for Human Mesh Recovery

Yongwei Nie; Mingxian Fan; Chengjiang Long; Qing Zhang; Jian Zhu; Xuemiao Xu

Incorporating Test-Time Optimization into Training with Dual Networks for Human Mesh Recovery

Yongwei Nie, Mingxian Fan, Chengjiang Long, Qing Zhang, Jian Zhu, Xuemiao Xu

TL;DR

This work incorporates the test-time optimization into training, performing a step of test-time optimization for each sample in the training batch before really conducting the training optimization over all the training samples, and obtains a meta-model, the meta-parameter of which is friendly to the test-time optimization.

Abstract

Human Mesh Recovery (HMR) is the task of estimating a parameterized 3D human mesh from an image. There is a kind of methods first training a regression model for this problem, then further optimizing the pretrained regression model for any specific sample individually at test time. However, the pretrained model may not provide an ideal optimization starting point for the test-time optimization. Inspired by meta-learning, we incorporate the test-time optimization into training, performing a step of test-time optimization for each sample in the training batch before really conducting the training optimization over all the training samples. In this way, we obtain a meta-model, the meta-parameter of which is friendly to the test-time optimization. At test time, after several test-time optimization steps starting from the meta-parameter, we obtain much higher HMR accuracy than the test-time optimization starting from the simply pretrained regression model. Furthermore, we find test-time HMR objectives are different from training-time objectives, which reduces the effectiveness of the learning of the meta-model. To solve this problem, we propose a dual-network architecture that unifies the training-time and test-time objectives. Our method, armed with meta-learning and the dual networks, outperforms state-of-the-art regression-based and optimization-based HMR approaches, as validated by the extensive experiments. The codes are available at https://github.com/fmx789/Meta-HMR.

Incorporating Test-Time Optimization into Training with Dual Networks for Human Mesh Recovery

TL;DR

Abstract

Paper Structure (24 sections, 9 equations, 41 figures, 11 tables, 1 algorithm)

This paper contains 24 sections, 9 equations, 41 figures, 11 tables, 1 algorithm.

Introduction
Related Work
Our Method
Test-time Optimization
Incorporating Test-time Optimization into Training
Unifying Training and Test-time Optimization Objectives with Dual Networks
Inference with Dual Networks
Implementation Details
Experiments
Datasets
Training, Testing and Metrics
Comparison with Previous Approaches
Ablation study
Conclusion
Appendix
...and 9 more sections

Figures (41)

Figure 1: Overview of the dual-network meta-learning HMR method, composed of a main HMR regression network $f_{\mathbf{w}}$ and an auxiliary network $f_{\mathbf{u}}$. Both networks have the same architecture but different parameters. Given $i^{th}$ batch of images, test-time optimization is first executed for each training image $\mathbf{I}_{i,j}$ in the batch individually, updating $f_{\mathbf{w}}$ to $f_{\mathbf{w}'_{i,j}}$ by performing a gradient descent step w.r.t. the test-time loss function $\mathcal{L}_{test-u}$. Then based on $\{f_{\mathbf{w}'_{i,j}}|j\in[1,M]\}$ ($M$ is the batch size), the training optimization is executed to update the parameters of both main and auxiliary networks by $\mathcal{L}_{train}$ with different arguments respectively. $\mathbf{w}_{meta}$ and $\mathbf{u}_{meta}$ are the finally generated meta-parameters. $f_{\mathbf{u}}$ generates "Pseudo SMPLs" that are used in the test-time loss to supervise the learning of the "Estimated SMPL Inner". GT SMPLs are used in the training loss to supervise the learning of "Estimated SMPL Outer" and the Pseudo SMPLs.
Figure 3: Influence of optimization steps during inference. Our method outperforms EFT when using the same regression model. As optimization proceeds, our results continuously become better, while those of EFT become better at first and then become worse (see (a) and (b)). (c) shows that our method achieves faster convergence compared to EFT.
Figure 6: Per-joint error analysis between Our$_{\rm CLIFF}$ and EFT$_{\rm CLIFF}$. The testing dataset is 3DPW von2018recovering.
Figure 8: More qualitative comparisons with SOTA methods. We show results produced by HybrIK li2021hybrik, NIKI li2023niki, ProPose fang2023learning, ReFit wang2023refit, CLIFF li2022cliff, EFT$_{\rm CLIFF}$, and our method ($\dagger$: OpenPose, $\ast$: RSN ).
Figure 9: More qualitative comparisons with SOTA methods. We show results produced by HybrIK li2021hybrik, NIKI li2023niki, ProPose fang2023learning, ReFit wang2023refit, CLIFF li2022cliff, EFT$_{\rm CLIFF}$, and our method ($\dagger$: OpenPose, $\ast$: RSN ).
...and 36 more figures

Incorporating Test-Time Optimization into Training with Dual Networks for Human Mesh Recovery

TL;DR

Abstract

Incorporating Test-Time Optimization into Training with Dual Networks for Human Mesh Recovery

Authors

TL;DR

Abstract

Table of Contents

Figures (41)