Privacy-Preserved Taxi Demand Prediction System Utilizing Distributed Data
Ren Ozeki, Haruki Yonekura, Hamada Rizk, Hirozumi Yamaguchi
TL;DR
CC-Net tackles privacy-sensitive taxi-demand prediction by fusing contrastive feature learning with decentralized neighbor-based collaboration, avoiding exposure of raw data. It introduces a hexagonal virtual grid, a Transformer-based feature encoder with self-supervised contrastive learning, and a similarity-driven collaboration strategy that handles non-IID data while personalizing per-client predictions. Empirical results on five Japanese providers over 14 months show CC-Net yields at least 2.2% higher accuracy than non-federated baselines and demonstrates robustness against membership inference attacks, with privacy preserved at the architectural level. Collectively, CC-Net offers a practical blueprint for privacy-preserving, scalable taxi-demand forecasting in distributed urban ecosystems.
Abstract
Accurate taxi-demand prediction is essential for optimizing taxi operations and enhancing urban transportation services. However, using customers' data in these systems raises significant privacy and security concerns. Traditional federated learning addresses some privacy issues by enabling model training without direct data exchange but often struggles with accuracy due to varying data distributions across different regions or service providers. In this paper, we propose CC-Net: a novel approach using collaborative learning enhanced with contrastive learning for taxi-demand prediction. Our method ensures high performance by enabling multiple parties to collaboratively train a demand-prediction model through hierarchical federated learning. In this approach, similar parties are clustered together, and federated learning is applied within each cluster. The similarity is defined without data exchange, ensuring privacy and security. We evaluated our approach using real-world data from five taxi service providers in Japan over fourteen months. The results demonstrate that CC-Net maintains the privacy of customers' data while improving prediction accuracy by at least 2.2% compared to existing techniques.
