Table of Contents
Fetching ...

CloSe: A 3D Clothing Segmentation Dataset and Model

Dimitrije Antić, Garvita Tiwari, Batuhan Ozcomlekci, Riccardo Marin, Gerard Pons-Moll

TL;DR

CloSe addresses the lack of real-world, fine-grained 3D clothing segmentation by introducing CloSe-D, a large-scale dataset with 18 garment classes, and CloSe-Net, a point-cloud segmentation model that leverages body priors and a learnable clothing codebook with attention. The approach enables accurate per-point labeling from colored 3D scans without SMPL+D registration, and it includes CloSe-T, an interactive tool for rapid annotation and continual-learning-based refinement. The authors demonstrate superior segmentation performance versus prior 3D clothing methods, improved generalization to public real-world datasets, and practical gains in labeling efficiency. The work further shows the value of coupling data-driven clothing priors with body-aware representations, and it establishes CloSe-D++ as a broader resource for real-world clothing analysis and 4D segmentation scenarios.

Abstract

3D Clothing modeling and datasets play crucial role in the entertainment, animation, and digital fashion industries. Existing work often lacks detailed semantic understanding or uses synthetic datasets, lacking realism and personalization. To address this, we first introduce CloSe-D: a novel large-scale dataset containing 3D clothing segmentation of 3167 scans, covering a range of 18 distinct clothing classes. Additionally, we propose CloSe-Net, the first learning-based 3D clothing segmentation model for fine-grained segmentation from colored point clouds. CloSe-Net uses local point features, body-clothing correlation, and a garment-class and point features-based attention module, improving performance over baselines and prior work. The proposed attention module enables our model to learn appearance and geometry-dependent clothing prior from data. We further validate the efficacy of our approach by successfully segmenting publicly available datasets of people in clothing. We also introduce CloSe-T, a 3D interactive tool for refining segmentation labels. Combining the tool with CloSe-T in a continual learning setup demonstrates improved generalization on real-world data. Dataset, model, and tool can be found at https://virtualhumans.mpi-inf.mpg.de/close3dv24/.

CloSe: A 3D Clothing Segmentation Dataset and Model

TL;DR

CloSe addresses the lack of real-world, fine-grained 3D clothing segmentation by introducing CloSe-D, a large-scale dataset with 18 garment classes, and CloSe-Net, a point-cloud segmentation model that leverages body priors and a learnable clothing codebook with attention. The approach enables accurate per-point labeling from colored 3D scans without SMPL+D registration, and it includes CloSe-T, an interactive tool for rapid annotation and continual-learning-based refinement. The authors demonstrate superior segmentation performance versus prior 3D clothing methods, improved generalization to public real-world datasets, and practical gains in labeling efficiency. The work further shows the value of coupling data-driven clothing priors with body-aware representations, and it establishes CloSe-D++ as a broader resource for real-world clothing analysis and 4D segmentation scenarios.

Abstract

3D Clothing modeling and datasets play crucial role in the entertainment, animation, and digital fashion industries. Existing work often lacks detailed semantic understanding or uses synthetic datasets, lacking realism and personalization. To address this, we first introduce CloSe-D: a novel large-scale dataset containing 3D clothing segmentation of 3167 scans, covering a range of 18 distinct clothing classes. Additionally, we propose CloSe-Net, the first learning-based 3D clothing segmentation model for fine-grained segmentation from colored point clouds. CloSe-Net uses local point features, body-clothing correlation, and a garment-class and point features-based attention module, improving performance over baselines and prior work. The proposed attention module enables our model to learn appearance and geometry-dependent clothing prior from data. We further validate the efficacy of our approach by successfully segmenting publicly available datasets of people in clothing. We also introduce CloSe-T, a 3D interactive tool for refining segmentation labels. Combining the tool with CloSe-T in a continual learning setup demonstrates improved generalization on real-world data. Dataset, model, and tool can be found at https://virtualhumans.mpi-inf.mpg.de/close3dv24/.
Paper Structure (45 sections, 4 equations, 16 figures, 8 tables)

This paper contains 45 sections, 4 equations, 16 figures, 8 tables.

Figures (16)

  • Figure 2: CloSe-Net: Given a colored point cloud $\mathbf{P} = \{ \mathbf{p}_i \hdots \mathbf{p}_n \}$ with SMPL parameters ($\boldsymbol{\theta}, \boldsymbol{\beta}$), and clothing classes (${\mathrm{g}}$) detected in the scan, where $\mathbf{p}_i = \{ \mathbf{x}_i | \mathbf{c}_i | \mathbf{n}_i\}$ represent point location, color and normal of a point, CloSe-Net predicts fine-grained per-point segmentation labels. (a) Point Encoder(Sec. \ref{['sec:point']}) takes $\mathbf{P}$, as input and predicts per-point features $F^{\mathrm{p}}$. (b) Clothing Encoder(Sec. \ref{['sec:attention']}) consists of a learnable codebook $G$ and an attention module, which predicts $F^{\mathrm{c}}$, based on per-point feature $\mathrm{\mathbf{p'}^2_i}$ and $G$. This $\mathrm{\mathbf{p'}^2_i}$ is intermediate feature of Point Encoder. (c) Body Encoder(Sec. \ref{['sec:canon']}), finds per-point canonical vertex in SMPL template, given SMPL $\boldsymbol{\theta}$, $\boldsymbol{\beta}$ parameters. (d) Finally, the Segmentation Decoder(Sec. \ref{['sec:decoder']}) takes $F^{\mathrm{p}}, F^{\mathrm{c}}, F^{\mathrm{b}}$ and predicts segmentation labels, $y_i$ for $i^\mathrm{{th}}$ point. Solid boxes in model are learnable, while others are fixed.
  • Figure 3: Comparison with SotA part segmentaiton models: DGCNN dgcnn and DeltaConv Wiersma2022DeltaConv. Our model predicts accurate clothing classes and finer boundaries in complex scans. This can be attributed to our model's utilization of local point features, body priors, and clothing class-based attention features.
  • Figure 4: Comparison with MGN-Seg bhatnagar2019mgn and GIM3D gim3d.
  • Figure 5: Clothing prior learned using attention module: Attention module in Clothing Encoder, learns a robust clothing prior based on point features. Here we visualise the attention of point feature on different clothing class.
  • Figure 6: Body Encoder(Top): As opposed to others, the proposed Body Encoder (Canonical) is simple, generalizes to difficult poses, and produces fine boundaries. Clothing Encoder(Bottom): Attention-based encoder and codebook learn distinct garment features and prior, achieving accurate segmentation prediction.
  • ...and 11 more figures