4D-DRESS: A 4D Dataset of Real-world Human Clothing with Semantic Annotations

Wenbo Wang; Hsuan-I Ho; Chen Guo; Boxiang Rong; Artur Grigorev; Jie Song; Juan Jose Zarate; Otmar Hilliges

4D-DRESS: A 4D Dataset of Real-world Human Clothing with Semantic Annotations

Wenbo Wang, Hsuan-I Ho, Chen Guo, Boxiang Rong, Artur Grigorev, Jie Song, Juan Jose Zarate, Otmar Hilliges

TL;DR

4D-DRESS introduces the first real-world 4D clothed human dataset with semantic vertex-level garment labels, garment meshes, and SMPL(-X) fits across 64 outfits and 520 motion sequences (78k frames). It presents a semi-automatic template-free 4D parsing pipeline that combines multi-view voting from PAR, OPT, and SAM with Graph Cut optimization and a manual rectification step to achieve high-quality vertex annotations. The dataset enables rigorous benchmarks for clothing simulation, clothed human reconstruction, parsing, and representation learning, illustrating realistic garment dynamics and highlighting gaps in current methods when transferring from synthetic data. By providing real-world semantic labels and high-fidelity garment meshes, 4D-DRESS serves as a valuable ground truth and testbed to drive progress in realistic avatar clothing, simulation, and reconstruction research.

Abstract

The studies of human clothing for digital avatars have predominantly relied on synthetic datasets. While easy to collect, synthetic data often fall short in realism and fail to capture authentic clothing dynamics. Addressing this gap, we introduce 4D-DRESS, the first real-world 4D dataset advancing human clothing research with its high-quality 4D textured scans and garment meshes. 4D-DRESS captures 64 outfits in 520 human motion sequences, amounting to 78k textured scans. Creating a real-world clothing dataset is challenging, particularly in annotating and segmenting the extensive and complex 4D human scans. To address this, we develop a semi-automatic 4D human parsing pipeline. We efficiently combine a human-in-the-loop process with automation to accurately label 4D scans in diverse garments and body movements. Leveraging precise annotations and high-quality garment meshes, we establish several benchmarks for clothing simulation and reconstruction. 4D-DRESS offers realistic and challenging data that complements synthetic sources, paving the way for advancements in research of lifelike human clothing. Website: https://ait.ethz.ch/4d-dress.

4D-DRESS: A 4D Dataset of Real-world Human Clothing with Semantic Annotations

TL;DR

Abstract

Paper Structure (53 sections, 10 equations, 20 figures, 9 tables)

This paper contains 53 sections, 10 equations, 20 figures, 9 tables.

Introduction
Related Work
4D clothed human dataset.
Human parsing.
Methodology
Multi-view Parsing
Human image parser (PAR).
Optical flow transfer (OPT).
Segmentation masks (SAM).
Graph Cut Optimization for Vertex Parsing
Vertex-wise unary energy.
Edge-wise binary energy.
Manual Rectification of 3D Labels
Experiments
Dataset Description
...and 38 more sections

Figures (20)

Figure 1: 4D Human parsing method. We first render current and previous frame scans into multi-view images and labels. Then collect multi-view parsing results from the image parser, optical flows, and segmentation masks (\ref{['sec:multi-view_parsing']}). Finally, we project multi-view labels to 3D vertices and optimize vertex labels using the Graph Cut algorithm with vertex-wise unary energy and edge-wise binary energy (\ref{['sec:graph_cut']}). The manual rectification labels can be easily introduced by checking multi-view rendered labels. (\ref{['sec:manual_efforts']}).
Figure 2: Qualitative ablation study. We visualize the effectiveness of our 4D human parsing method on our 4D-DRESS dataset. From left to right, we show the improvements after adding the optical flow labels and mask scores to the multi-view image parser labels. The manual rectification efforts can be easily introduced from multi-view rendered labels, with which we achieve high-quality vertex annotations. The problem of isolated labels can be relieved by introducing the edge-wise binary energy term.
Figure 3: Qualitative examples for clothing simulation methods. On the left are templates used for simulations. On the right are ground-truth geometries and original scans, LBS baseline results in body penetrations and overly stretched areas. Compared to other methods, HOOD better models dresses and jackets and, with tuned material parameters, HOOD* achieves simulations closest to the ground truth.
Figure 4: Examples of clothed human reconstruction on 4D-DRESS. We evaluate state-of-the-art methods using both inner (Top) and outer (Bottom) outfits. We show that existing methods generally struggle with the challenging loose garments. Moreover, these approaches cannot faithfully recover realistic details such as clothing wrinkles.
Figure 5: Examples of clothing reconstruction on 4D-DRESS. We visualize the reconstructed garment meshes from different approaches. These methods trained on synthetic datasets failed to predict accurate clothing sizes and detailed wrinkles.
...and 15 more figures

4D-DRESS: A 4D Dataset of Real-world Human Clothing with Semantic Annotations

TL;DR

Abstract

4D-DRESS: A 4D Dataset of Real-world Human Clothing with Semantic Annotations

Authors

TL;DR

Abstract

Table of Contents

Figures (20)