Table of Contents
Fetching ...

PersonaCraft: Personalized and Controllable Full-Body Multi-Human Scene Generation Using Occlusion-Aware 3D-Conditioned Diffusion

Gwanghyun Kim, Suh Yoon Jeon, Seunggyu Lee, Se Young Chun

TL;DR

PersonaCraft addresses the challenge of generating realistic, multi-person scenes with full-body personalization under heavy occlusions. It introduces SCNet-based 3D-aware pose conditioning using SMPLx depth and normals, plus OccNet with OcCFG to refine occluded regions, and integrates with Face Identity ControlNet for comprehensive identity control. The method demonstrates superior body-shape preservation, pose accuracy, and occlusion robustness over 2D-pose baselines, validated by quantitative metrics and extensive user studies. It enables flexible user-defined body-shape control via dual-pathway conditioning (SMPLx and text-based) and remains efficient for multi-identity generation, offering practical impact for avatars, media production, and synthetic data generation.

Abstract

We present PersonaCraft, a framework for controllable and occlusion-robust full-body personalized image synthesis of multiple individuals in complex scenes. Current methods struggle with occlusion-heavy scenarios and complete body personalization, as 2D pose conditioning lacks 3D geometry, often leading to ambiguous occlusions and anatomical distortions, and many approaches focus solely on facial identity. In contrast, our PersonaCraft integrates diffusion models with 3D human modeling, employing SMPLx-ControlNet, to utilize 3D geometry like depth and normal maps for robust 3D-aware pose conditioning and enhanced anatomical coherence. To handle fine-grained occlusions, we propose Occlusion Boundary Enhancer Network that exploits depth edge signals with occlusion-focused training, and Occlusion-Aware Classifier-Free Guidance strategy that selectively reinforces conditioning in occluded regions without affecting unoccluded areas. PersonaCraft can seamlessly be combined with Face Identity ControlNet, achieving full-body multi-human personalization and thus marking a significant advancement beyond prior approaches that concentrate only on facial identity. Our dual-pathway body shape representation with SMPLx-based shape parameters and textual refinement, enables precise full-body personalization and flexible user-defined body shape adjustments. Extensive quantitative experiments and user studies demonstrate that PersonaCraft significantly outperforms existing methods in generating high-quality, multi-person images with accurate personalization and robust occlusion handling.

PersonaCraft: Personalized and Controllable Full-Body Multi-Human Scene Generation Using Occlusion-Aware 3D-Conditioned Diffusion

TL;DR

PersonaCraft addresses the challenge of generating realistic, multi-person scenes with full-body personalization under heavy occlusions. It introduces SCNet-based 3D-aware pose conditioning using SMPLx depth and normals, plus OccNet with OcCFG to refine occluded regions, and integrates with Face Identity ControlNet for comprehensive identity control. The method demonstrates superior body-shape preservation, pose accuracy, and occlusion robustness over 2D-pose baselines, validated by quantitative metrics and extensive user studies. It enables flexible user-defined body-shape control via dual-pathway conditioning (SMPLx and text-based) and remains efficient for multi-identity generation, offering practical impact for avatars, media production, and synthetic data generation.

Abstract

We present PersonaCraft, a framework for controllable and occlusion-robust full-body personalized image synthesis of multiple individuals in complex scenes. Current methods struggle with occlusion-heavy scenarios and complete body personalization, as 2D pose conditioning lacks 3D geometry, often leading to ambiguous occlusions and anatomical distortions, and many approaches focus solely on facial identity. In contrast, our PersonaCraft integrates diffusion models with 3D human modeling, employing SMPLx-ControlNet, to utilize 3D geometry like depth and normal maps for robust 3D-aware pose conditioning and enhanced anatomical coherence. To handle fine-grained occlusions, we propose Occlusion Boundary Enhancer Network that exploits depth edge signals with occlusion-focused training, and Occlusion-Aware Classifier-Free Guidance strategy that selectively reinforces conditioning in occluded regions without affecting unoccluded areas. PersonaCraft can seamlessly be combined with Face Identity ControlNet, achieving full-body multi-human personalization and thus marking a significant advancement beyond prior approaches that concentrate only on facial identity. Our dual-pathway body shape representation with SMPLx-based shape parameters and textual refinement, enables precise full-body personalization and flexible user-defined body shape adjustments. Extensive quantitative experiments and user studies demonstrate that PersonaCraft significantly outperforms existing methods in generating high-quality, multi-person images with accurate personalization and robust occlusion handling.

Paper Structure

This paper contains 29 sections, 8 equations, 25 figures, 9 tables.

Figures (25)

  • Figure 1: PersonaCraft generates realistic, personalized images of multiple individuals with complex occlusions, preserving facial identity and body shape using occlusion-aware 3D pose and shape conditioned diffusion. PersonaCraft outperforms baselines in body shape personalization (blue arrows indicate failures) and naturalness (yellow arrows highlight artifacts, with zoomed-in views in yellow boxes).
  • Figure 2: SMPLx-ControlNet (SCNet) with SMPLx depth and normal improves occlusion handling in pose generation compared to 2D OpenPose, though fine-grained occlusions remain challenging. Using our occlusion-focused methods, OccNet and OccCFG, we achieve superior pose consistency and anatomical coherence.
  • Figure 3: Overview of our occlusion-aware 3D pose and shape conditioning. We generate SMPLx renderings for SMPLx-ControlNet (SCNet) and derive occlusion masks for the Occlusion Boundary Enhancer Network (OccNet) by counting intersected faces and masking depth edges. Combining SCNet and OccNet residuals with occlusion masks enables the base U-Net to handle occlusion-aware 3D pose and shape conditioning. Occlusion-aware classifier-free guidance (OccCFG) further improves anatomical coherence in occluded regions.
  • Figure 4: Training of SMPLx-ControlNet (SCNet) and Occlusion Boundary Enhancer Network (OccNet). The networks are trained separately, with SMPLx depth, normal maps, and occlusion masks extracted from training images. The pretrained ControlNet zhang2023adding is fine-tuned with these 3D pose representations.
  • Figure 5: Full pipeline of PersonaCraft for multi-human full-body personalization. By integrating our method with face embeddings from InsightFace insightface and Face Identity ControlNet wang2024instantid, we advance toward full-body multi-human personalization.
  • ...and 20 more figures