Privacy-Preserving Logistic Regression Training on Large Datasets

John Chiang

Privacy-Preserving Logistic Regression Training on Large Datasets

John Chiang

TL;DR

This work advances privacy-preserving machine learning by enabling scalable logistic regression training on large encrypted datasets using Homomorphic Encryption. It introduces a mini-batch enhanced Nesterov Accelerated Gradient with a quadratic gradient, along with a full-batch variant, both built on database-encoded data and CKKS/HEAAN-like HE, to accelerate convergence while maintaining privacy. Empirical results on a large real financial dataset and on MNIST show competitive accuracy and practical runtimes, illustrating the feasibility of encrypted training at scale. The methods balance convergence speed, HE-depth constraints, and data-encoding overhead, offering a viable path for secure learning in data-sensitive domains.

Abstract

Privacy-preserving machine learning is one class of cryptographic methods that aim to analyze private and sensitive data while keeping privacy, such as homomorphic logistic regression training over large encrypted data. In this paper, we propose an efficient algorithm for logistic regression training on large encrypted data using Homomorphic Encryption (HE), which is the mini-batch version of recent methods using a faster gradient variant called $\texttt{quadratic gradient}$. It is claimed that $\texttt{quadratic gradient}$ can integrate curve information (Hessian matrix) into the gradient and therefore can effectively accelerate the first-order gradient (descent) algorithms. We also implement the full-batch version of their method when the encrypted dataset is so large that it has to be encrypted in the mini-batch manner. We compare our mini-batch algorithm with our full-batch implementation method on real financial data consisting of 422,108 samples with 200 freatures. %Our experiments show that Nesterov's accelerated gradient (NAG) Given the inefficiency of HEs, our results are inspiring and demonstrate that the logistic regression training on large encrypted dataset is of practical feasibility, marking a significant milestone in our understanding.

Privacy-Preserving Logistic Regression Training on Large Datasets

TL;DR

Abstract

. It is claimed that

can integrate curve information (Hessian matrix) into the gradient and therefore can effectively accelerate the first-order gradient (descent) algorithms. We also implement the full-batch version of their method when the encrypted dataset is so large that it has to be encrypted in the mini-batch manner. We compare our mini-batch algorithm with our full-batch implementation method on real financial data consisting of 422,108 samples with 200 freatures. %Our experiments show that Nesterov's accelerated gradient (NAG) Given the inefficiency of HEs, our results are inspiring and demonstrate that the logistic regression training on large encrypted dataset is of practical feasibility, marking a significant milestone in our understanding.

Paper Structure (23 sections, 17 equations, 3 figures, 2 tables, 1 algorithm)

This paper contains 23 sections, 17 equations, 3 figures, 2 tables, 1 algorithm.

Introduction
HE-based approaches
MPC-based approaches
Preliminaries
Homomorphic Encryption
Database Encoding
Logistic Regression
Technical Details
Quadratic Gradient
Mini-Batch Method
Performance Evaluation
Full-Batch Method
Secure Training
Polynomial Approximation
Usage Model
...and 8 more sections

Figures (3)

Figure 1: Partition and Encryption of Training Data
Figure 2: Training results in the unencrypted setting for the MNIST and Financial datasets
Figure 3: the entire process of logistic regression training via homomorphic encryption

Privacy-Preserving Logistic Regression Training on Large Datasets

TL;DR

Abstract

Privacy-Preserving Logistic Regression Training on Large Datasets

Authors

TL;DR

Abstract

Table of Contents

Figures (3)