Table of Contents
Fetching ...

ETCH-X: Robustify Expressive Body Fitting to Clothed Humans with Composable Datasets

Xiaoben Li, Jingyi Wu, Zeyu Cai, Yu Siyuan, Boqian Li, Yuliang Xiu

Abstract

Human body fitting, which aligns parametric body models such as SMPL to raw 3D point clouds of clothed humans, serves as a crucial first step for downstream tasks like animation and texturing. An effective fitting method should be both locally expressive-capturing fine details such as hands and facial features-and globally robust to handle real-world challenges, including clothing dynamics, pose variations, and noisy or partial inputs. Existing approaches typically excel in only one aspect, lacking an all-in-one solution.We upgrade ETCH to ETCH-X, which leverages a tightness-aware fitting paradigm to filter out clothing dynamics ("undress"), extends expressiveness with SMPL-X, and replaces explicit sparse markers (which are highly sensitive to partial data) with implicit dense correspondences ("dense fit") for more robust and fine-grained body fitting. Our disentangled "undress" and "dense fit" modular stages enable separate and scalable training on composable data sources, including diverse simulated garments (CLOTH3D), large-scale full-body motions (AMASS), and fine-grained hand gestures (InterHand2.6M), improving outfit generalization and pose robustness of both bodies and hands. Our approach achieves robust and expressive fitting across diverse clothing, poses, and levels of input completeness, delivering a substantial performance improvement over ETCH on both: 1) seen data, such as 4D-Dress (MPJPE-All, 33.0% ) and CAPE (V2V-Hands, 35.8% ), and 2) unseen data, such as BEDLAM2.0 (MPJPE-All, 80.8% ; V2V-All, 80.5% ). Code and models will be released at https://xiaobenli00.github.io/ETCH-X/.

ETCH-X: Robustify Expressive Body Fitting to Clothed Humans with Composable Datasets

Abstract

Human body fitting, which aligns parametric body models such as SMPL to raw 3D point clouds of clothed humans, serves as a crucial first step for downstream tasks like animation and texturing. An effective fitting method should be both locally expressive-capturing fine details such as hands and facial features-and globally robust to handle real-world challenges, including clothing dynamics, pose variations, and noisy or partial inputs. Existing approaches typically excel in only one aspect, lacking an all-in-one solution.We upgrade ETCH to ETCH-X, which leverages a tightness-aware fitting paradigm to filter out clothing dynamics ("undress"), extends expressiveness with SMPL-X, and replaces explicit sparse markers (which are highly sensitive to partial data) with implicit dense correspondences ("dense fit") for more robust and fine-grained body fitting. Our disentangled "undress" and "dense fit" modular stages enable separate and scalable training on composable data sources, including diverse simulated garments (CLOTH3D), large-scale full-body motions (AMASS), and fine-grained hand gestures (InterHand2.6M), improving outfit generalization and pose robustness of both bodies and hands. Our approach achieves robust and expressive fitting across diverse clothing, poses, and levels of input completeness, delivering a substantial performance improvement over ETCH on both: 1) seen data, such as 4D-Dress (MPJPE-All, 33.0% ) and CAPE (V2V-Hands, 35.8% ), and 2) unseen data, such as BEDLAM2.0 (MPJPE-All, 80.8% ; V2V-All, 80.5% ). Code and models will be released at https://xiaobenli00.github.io/ETCH-X/.

Paper Structure

This paper contains 10 sections, 9 equations, 8 figures, 6 tables.

Figures (8)

  • Figure 1: Strengths of ETCH-X. While NICP marin24nicp, which uses implicit dense correspondence but lacks tightness-aware undressing, consistently produces overweight bodies from clothed scans (A), ETCH li2025etch, with tightness-aware undressing but sparse markers, fails to capture detailed body parts such as hands and face (B), and struggles with partial inputs due to missing markers (C). In contrast, our ETCH-X combines the strengths of both approaches, achieving robust and expressive fitting across diverse clothing, poses, and levels of input completeness (D).
  • Figure 2: Two stages of ETCH-X: (A) Masked Undress, (B) Dense Fit. In the Masked Undress stage, we take a clothed scan as input and compute the undressed body ($\hat{\mathbf{y}}_{i}= \mathbf{x}_{i}+ \hat{l}_{i} \hat{\mathbf{v}}_{i}$). In the Dense Fit stage, we implicitly learns the deforming field, which deforms the canonical SMPL-X into a posed one. Thanks to the decoupled design, the robustness to dynamic clothing and pose variations could be improved with simulated garments, , CLOTH3D bertiche2020cloth3d, and pose libraries, , AMASS mahmood2019amass for body poses and InterHand2.6M Moon_2020_ECCV_InterHand2.6M for hand poses, respectively.
  • Figure 3: Hand Refinement by Re-sampling. After obtaining initial body fitting, we re-sample points around the hand and fit hand model separately.
  • Figure 4: Failure Case of ETCH li2025etch on BEDLAM2.0. Two respresentative reasons for ETCH failure are incorrect part labeling (above) and inaccurate inner points (both). The failure is reflected in the large V2V (12.209cm) and MPJPE (15.031cm) errors of ETCH reported in \ref{['tab:comparison_bedlam']}
  • Figure 5: Partial Augmentation . ETCH-X predicts better body poses with partial augmentation.
  • ...and 3 more figures