Table of Contents
Fetching ...

Enhancing Diversity and Feasibility: Joint Population Synthesis from Multi-source Data Using Generative Models

Farbod Abbasi, Zachary Patterson, Bilal Farooq

TL;DR

Population synthesis for agent-based models faces limitations from single data sources and from sampling zeros and structural zeros. The authors propose a joint population synthesis framework using a WGAN-GP with an inverse gradient penalty to fuse census and travel-survey data, supported by two dataset-specific critics and a universal evaluation metric. The approach improves diversity and feasibility, with recall and precision gains (e.g., recall +7% and precision +15% over a sequential baseline; IGP adds roughly +10% recall and +1% precision) and a final similarity score of 88.1 compared with 84.6 for the baseline. A unified metric comprising SRMSE, JSD, correlations, PMSE, and ML efficacy enables cross-method comparison and shows the method preserves joint dependencies while expanding feasible attribute combinations. This multi-source synthesis framework has practical implications for more accurate and reliable ABM populations and can be extended to additional heterogeneous data sources and novel regularization strategies.

Abstract

Generating realistic synthetic populations is essential for agent-based models (ABM) in transportation and urban planning. Current methods face two major limitations. First, many rely on a single dataset or follow a sequential data fusion and generation process, which means they fail to capture the complex interplay between features. Second, these approaches struggle with sampling zeros (valid but unobserved attribute combinations) and structural zeros (infeasible combinations due to logical constraints), which reduce the diversity and feasibility of the generated data. This study proposes a novel method to simultaneously integrate and synthesize multi-source datasets using a Wasserstein Generative Adversarial Network (WGAN) with gradient penalty. This joint learning method improves both the diversity and feasibility of synthetic data by defining a regularization term (inverse gradient penalty) for the generator loss function. For the evaluation, we implement a unified evaluation metric for similarity, and place special emphasis on measuring diversity and feasibility through recall, precision, and the F1 score. Results show that the proposed joint approach outperforms the sequential baseline, with recall increasing by 7\% and precision by 15\%. Additionally, the regularization term further improves diversity and feasibility, reflected in a 10\% increase in recall and 1\% in precision. We assess similarity distributions using a five-metric score. The joint approach performs better overall, and reaches a score of 88.1 compared to 84.6 for the sequential method. Since synthetic populations serve as a key input for ABM, this multi-source generative approach has the potential to significantly enhance the accuracy and reliability of ABM.

Enhancing Diversity and Feasibility: Joint Population Synthesis from Multi-source Data Using Generative Models

TL;DR

Population synthesis for agent-based models faces limitations from single data sources and from sampling zeros and structural zeros. The authors propose a joint population synthesis framework using a WGAN-GP with an inverse gradient penalty to fuse census and travel-survey data, supported by two dataset-specific critics and a universal evaluation metric. The approach improves diversity and feasibility, with recall and precision gains (e.g., recall +7% and precision +15% over a sequential baseline; IGP adds roughly +10% recall and +1% precision) and a final similarity score of 88.1 compared with 84.6 for the baseline. A unified metric comprising SRMSE, JSD, correlations, PMSE, and ML efficacy enables cross-method comparison and shows the method preserves joint dependencies while expanding feasible attribute combinations. This multi-source synthesis framework has practical implications for more accurate and reliable ABM populations and can be extended to additional heterogeneous data sources and novel regularization strategies.

Abstract

Generating realistic synthetic populations is essential for agent-based models (ABM) in transportation and urban planning. Current methods face two major limitations. First, many rely on a single dataset or follow a sequential data fusion and generation process, which means they fail to capture the complex interplay between features. Second, these approaches struggle with sampling zeros (valid but unobserved attribute combinations) and structural zeros (infeasible combinations due to logical constraints), which reduce the diversity and feasibility of the generated data. This study proposes a novel method to simultaneously integrate and synthesize multi-source datasets using a Wasserstein Generative Adversarial Network (WGAN) with gradient penalty. This joint learning method improves both the diversity and feasibility of synthetic data by defining a regularization term (inverse gradient penalty) for the generator loss function. For the evaluation, we implement a unified evaluation metric for similarity, and place special emphasis on measuring diversity and feasibility through recall, precision, and the F1 score. Results show that the proposed joint approach outperforms the sequential baseline, with recall increasing by 7\% and precision by 15\%. Additionally, the regularization term further improves diversity and feasibility, reflected in a 10\% increase in recall and 1\% in precision. We assess similarity distributions using a five-metric score. The joint approach performs better overall, and reaches a score of 88.1 compared to 84.6 for the sequential method. Since synthetic populations serve as a key input for ABM, this multi-source generative approach has the potential to significantly enhance the accuracy and reliability of ABM.
Paper Structure (15 sections, 14 equations, 8 figures, 5 tables)

This paper contains 15 sections, 14 equations, 8 figures, 5 tables.

Figures (8)

  • Figure 1: An overall diagram representing the framework
  • Figure 2: The training procedure of Joint GAN
  • Figure 3: Calibrated architecture and hyperparameters of the WGAN
  • Figure 4: Conceptual diagram of precision and recall based on sampling zeros and structural zeros
  • Figure 5: Correlation between the columns of real and synthetic datasets
  • ...and 3 more figures