Table of Contents
Fetching ...

PairHuman: A High-Fidelity Photographic Dataset for Customized Dual-Person Generation

Ting Pan, Ye Wang, Peiguang Jing, Rui Ma, Zili Yi, Yu Liu

TL;DR

PairHuman provides the first large-scale, high-fidelity dual-person portrait dataset with rich annotations to advance customized dual-person generation. DHumanDiff introduces a diffusion-based framework that fuses text, two reference-face inputs, and identity-preserving conditioning through visual disparity and subject-augmented mechanisms, aided by adapters and cascaded inference. Experiments show improved facial fidelity, textual and image alignment, and overall perceptual quality on PairHuman versus FFHQ-based baselines and existing multi-subject methods, with ablations validating each component. The work highlights practical potential for personalized wedding, reminiscence, and social applications while outlining limitations in demographic diversity and lighting robustness, and it provides openly available data and tooling for further research.

Abstract

Personalized dual-person portrait customization has considerable potential applications, such as preserving emotional memories and facilitating wedding photography planning. However, the absence of a benchmark dataset hinders the pursuit of high-quality customization in dual-person portrait generation. In this paper, we propose the PairHuman dataset, which is the first large-scale benchmark dataset specifically designed for generating dual-person portraits that meet high photographic standards. The PairHuman dataset contains more than 100K images that capture a variety of scenes, attire, and dual-person interactions, along with rich metadata, including detailed image descriptions, person localization, human keypoints, and attribute tags. We also introduce DHumanDiff, which is a baseline specifically crafted for dual-person portrait generation that features enhanced facial consistency and simultaneously balances in personalized person generation and semantic-driven scene creation. Finally, the experimental results demonstrate that our dataset and method produce highly customized portraits with superior visual quality that are tailored to human preferences. Our dataset is publicly available at https://github.com/annaoooo/PairHuman.

PairHuman: A High-Fidelity Photographic Dataset for Customized Dual-Person Generation

TL;DR

PairHuman provides the first large-scale, high-fidelity dual-person portrait dataset with rich annotations to advance customized dual-person generation. DHumanDiff introduces a diffusion-based framework that fuses text, two reference-face inputs, and identity-preserving conditioning through visual disparity and subject-augmented mechanisms, aided by adapters and cascaded inference. Experiments show improved facial fidelity, textual and image alignment, and overall perceptual quality on PairHuman versus FFHQ-based baselines and existing multi-subject methods, with ablations validating each component. The work highlights practical potential for personalized wedding, reminiscence, and social applications while outlining limitations in demographic diversity and lighting robustness, and it provides openly available data and tooling for further research.

Abstract

Personalized dual-person portrait customization has considerable potential applications, such as preserving emotional memories and facilitating wedding photography planning. However, the absence of a benchmark dataset hinders the pursuit of high-quality customization in dual-person portrait generation. In this paper, we propose the PairHuman dataset, which is the first large-scale benchmark dataset specifically designed for generating dual-person portraits that meet high photographic standards. The PairHuman dataset contains more than 100K images that capture a variety of scenes, attire, and dual-person interactions, along with rich metadata, including detailed image descriptions, person localization, human keypoints, and attribute tags. We also introduce DHumanDiff, which is a baseline specifically crafted for dual-person portrait generation that features enhanced facial consistency and simultaneously balances in personalized person generation and semantic-driven scene creation. Finally, the experimental results demonstrate that our dataset and method produce highly customized portraits with superior visual quality that are tailored to human preferences. Our dataset is publicly available at https://github.com/annaoooo/PairHuman.

Paper Structure

This paper contains 57 sections, 22 equations, 15 figures, 11 tables, 1 algorithm.

Figures (15)

  • Figure 1: Examples from the current multi-person image dataset that are unsuitable for high-fidelity dual-person portrait generation.
  • Figure 1: Generalization analysis of DHumanDiff across age and ethnic diversity
  • Figure 2: Illustration of the data collection and annotation process for the PairHuman dataset.
  • Figure 2: Sensitivity analysis of the DHumanDiff model to illumination and quality of reference images.
  • Figure 3: Examples of PairHuman Dataset Annotations, including human bounding boxes, keypoints, masks, and image captions. Image captions are color-coded for clarity: orange for persons, green for actions, blue for attire, and purple for backgrounds.
  • ...and 10 more figures