Table of Contents
Fetching ...

PP-STAT: An Efficient Privacy-Preserving Statistical Analysis Framework using Homomorphic Encryption

Hyunmin Choi

TL;DR

PP-STAT introduces a privacy-preserving statistical analysis framework based on the CKKS homomorphic encryption scheme. A key contribution is CryptoInvSqrt, a Chebyshev-based initialization for inverse $n$-th roots that accelerates Newton iterations and substantially reduces bootstrapping depth, enabling efficient computation of statistics such as Z-score, skewness, kurtosis, CV, and PCC over encrypted data. A pre-normalization scaling technique further lowers multiplicative depth by folding scaling constants into mean/variance computations, allowing higher-degree Chebyshev polynomials within the same depth. Empirical results on real datasets (Adult and Insurance) show mean relative errors in the $10^{-4}$ to $10^{-3}$ range and practical runtimes, with encrypted PCC between smoker and charges reaching $0.7873$ (MRE $2.86\times10^{-4}$). These advances demonstrate PP-STAT’s practical utility for secure, precise statistical analysis in privacy-sensitive domains.

Abstract

With the widespread adoption of cloud computing, the need for outsourcing statistical analysis to third-party platforms is growing rapidly. However, handling sensitive data such as medical records and financial information in cloud environments raises serious privacy concerns. In this paper, we present PP-STAT, a novel and efficient Homomorphic Encryption (HE)-based framework for privacy-preserving statistical analysis. HE enables computations to be performed directly on encrypted data without revealing the underlying plaintext. PP-STAT supports advanced statistical measures, including Z-score normalization, skewness, kurtosis, coefficient of variation, and Pearson correlation coefficient, all computed securely over encrypted data. To improve efficiency, PP-STAT introduces two key optimizations: (1) a Chebyshev-based approximation strategy for initializing inverse square root operations, and (2) a pre-normalization scaling technique that reduces multiplicative depth by folding constant scaling factors into mean and variance computations. These techniques significantly lower computational overhead and minimize the number of expensive bootstrapping procedures. Our evaluation on real-world datasets demonstrates that PP-STAT achieves high numerical accuracy, with mean relative error (MRE) below 2.4x10-4. Notably, the encrypted Pearson correlation coefficient between the smoker attribute and charges reaches 0.7873, with an MRE of 2.86x10-4. These results confirm the practical utility of PP-STAT for secure and precise statistical analysis in privacy-sensitive domains.

PP-STAT: An Efficient Privacy-Preserving Statistical Analysis Framework using Homomorphic Encryption

TL;DR

PP-STAT introduces a privacy-preserving statistical analysis framework based on the CKKS homomorphic encryption scheme. A key contribution is CryptoInvSqrt, a Chebyshev-based initialization for inverse -th roots that accelerates Newton iterations and substantially reduces bootstrapping depth, enabling efficient computation of statistics such as Z-score, skewness, kurtosis, CV, and PCC over encrypted data. A pre-normalization scaling technique further lowers multiplicative depth by folding scaling constants into mean/variance computations, allowing higher-degree Chebyshev polynomials within the same depth. Empirical results on real datasets (Adult and Insurance) show mean relative errors in the to range and practical runtimes, with encrypted PCC between smoker and charges reaching (MRE ). These advances demonstrate PP-STAT’s practical utility for secure, precise statistical analysis in privacy-sensitive domains.

Abstract

With the widespread adoption of cloud computing, the need for outsourcing statistical analysis to third-party platforms is growing rapidly. However, handling sensitive data such as medical records and financial information in cloud environments raises serious privacy concerns. In this paper, we present PP-STAT, a novel and efficient Homomorphic Encryption (HE)-based framework for privacy-preserving statistical analysis. HE enables computations to be performed directly on encrypted data without revealing the underlying plaintext. PP-STAT supports advanced statistical measures, including Z-score normalization, skewness, kurtosis, coefficient of variation, and Pearson correlation coefficient, all computed securely over encrypted data. To improve efficiency, PP-STAT introduces two key optimizations: (1) a Chebyshev-based approximation strategy for initializing inverse square root operations, and (2) a pre-normalization scaling technique that reduces multiplicative depth by folding constant scaling factors into mean and variance computations. These techniques significantly lower computational overhead and minimize the number of expensive bootstrapping procedures. Our evaluation on real-world datasets demonstrates that PP-STAT achieves high numerical accuracy, with mean relative error (MRE) below 2.4x10-4. Notably, the encrypted Pearson correlation coefficient between the smoker attribute and charges reaches 0.7873, with an MRE of 2.86x10-4. These results confirm the practical utility of PP-STAT for secure and precise statistical analysis in privacy-sensitive domains.

Paper Structure

This paper contains 29 sections, 12 equations, 3 figures, 5 tables, 6 algorithms.

Figures (3)

  • Figure 1: System overview of PP-STAT.
  • Figure 2: Comparison of step functions: $G_3^{(7)}(x)$ (green) vs. minimax approximation (blue).
  • Figure 3: Kernel density estimation (KDE) of charges in the Insurance dataset.