Table of Contents
Fetching ...

Core Mondrian: Basic Mondrian beyond k-anonymity

Adam Bloomston, Elizabeth Burke, Megan Cacace, Anne Diaz, Wren Dougherty, Matthew Gonzalez, Remington Gregg, Yeliz Güngör, Bryce Hayes, Eeway Hsu, Oron Israeli, Heesoo Kim, Sara Kwasnick, Joanne Lacsina, Demma Rosa Rodriguez, Adam Schiller, Whitney Schumacher, Jessica Simon, Maggie Tang, Skyler Wharton, Marilyn Wilcken

TL;DR

Core Mondrian optimizes Mondrian-style partitioning for production-scale privacy analytics by introducing an extensible Strategy Pattern architecture, a hybrid recursive-queue execution engine, and utility-preserving features like NaN-pattern pre-partitioning and dynamic suppression budgeting. It achieves lower Discernibility Metric (DM) and higher Revised Information Loss Metric (RILM) than Original Mondrian across multiple QID configurations while scaling to large datasets with multi-core parallelism (up to 4x speedup). The approach supports pluggable privacy models and maintains deterministic outputs, essential for auditable analytics in equity-focused applications. Practical impact includes enabling privacy-compliant, high-utility analytics at production scale, with avenues for future GPU acceleration and broader privacy constraints (e.g., group-based constraints).

Abstract

We present Core Mondrian, a scalable extension of the Original Mondrian partition-based anonymization algorithm. A modular strategy layer supports k-anonymity, allowing new privacy models to be added easily. A hybrid recursive/queue execution engine exploits multi-core parallelism while maintaining deterministic output. Utility-preserving enhancements include NaN-pattern pre-partitioning, metric-driven cut scoring, and dynamic suppression budget management. Experiments on the 48k-record UCI ADULT dataset and synthetically scaled versions up to 1M records achieve lower Discernibility Metric scores than Original Mondrian for numeric quasi-identifier sets while parallel processing delivers up to 4x speedup vs. sequential Core Mondrian. Core Mondrian enables privacy-compliant equity analytics at production scale.

Core Mondrian: Basic Mondrian beyond k-anonymity

TL;DR

Core Mondrian optimizes Mondrian-style partitioning for production-scale privacy analytics by introducing an extensible Strategy Pattern architecture, a hybrid recursive-queue execution engine, and utility-preserving features like NaN-pattern pre-partitioning and dynamic suppression budgeting. It achieves lower Discernibility Metric (DM) and higher Revised Information Loss Metric (RILM) than Original Mondrian across multiple QID configurations while scaling to large datasets with multi-core parallelism (up to 4x speedup). The approach supports pluggable privacy models and maintains deterministic outputs, essential for auditable analytics in equity-focused applications. Practical impact includes enabling privacy-compliant, high-utility analytics at production scale, with avenues for future GPU acceleration and broader privacy constraints (e.g., group-based constraints).

Abstract

We present Core Mondrian, a scalable extension of the Original Mondrian partition-based anonymization algorithm. A modular strategy layer supports k-anonymity, allowing new privacy models to be added easily. A hybrid recursive/queue execution engine exploits multi-core parallelism while maintaining deterministic output. Utility-preserving enhancements include NaN-pattern pre-partitioning, metric-driven cut scoring, and dynamic suppression budget management. Experiments on the 48k-record UCI ADULT dataset and synthetically scaled versions up to 1M records achieve lower Discernibility Metric scores than Original Mondrian for numeric quasi-identifier sets while parallel processing delivers up to 4x speedup vs. sequential Core Mondrian. Core Mondrian enables privacy-compliant equity analytics at production scale.

Paper Structure

This paper contains 35 sections, 10 figures.

Figures (10)

  • Figure 1: Core Mondrian consistently achieves lower DM scores and higher RILM scores than Original Mondrian
  • Figure 2: Information loss vs k for 4-QID sets
  • Figure 3: Privacy-utility tradeoff showing DM and RILM vs k for 4-QID sets
  • Figure 4: Impact of QID dimensionality on information loss and utility
  • Figure 5: Suppression rate decreases as recursive cutoff increases
  • ...and 5 more figures