Cold-Start Active Preference Learning in Socio-Economic Domains
Mojtaba Fayaz-Bakhsh, Danial Ataee, MohammadAmin Fazli
TL;DR
This work tackles the cold-start problem in active preference learning by introducing a PCA-driven warm-up that generates initial pseudo-labels without expert input, producing a warmed-up model before active querying. It integrates a simulated noisy Bradley–Terry oracle within a warm-start active-learning loop, using XGBoost for the underlying binary preference model. Across multiple socio-economic datasets, the PCA-based warm-up consistently improves early performance and reduces the labeled data required compared to standard cold-start baselines, demonstrating strong practical gains in low-data regimes. The approach is computationally efficient, scalable, and readily applicable to real-world socio-economic preference modeling, with future work exploring alternative unsupervised initializations, enhanced query strategies, and real human-in-the-loop validation.
Abstract
Active preference learning offers an efficient approach to modeling preferences, but it is hindered by the cold-start problem, which leads to a marked decline in performance when no initial labeled data are available. While cold-start solutions have been proposed for domains such as vision and text, the cold-start problem in active preference learning remains largely unexplored, underscoring the need for practical, effective methods. Drawing inspiration from established practices in social and economic research, the proposed method initiates learning with a self-supervised phase that employs Principal Component Analysis (PCA) to generate initial pseudo-labels. This process produces a \say{warmed-up} model based solely on the data's intrinsic structure, without requiring expert input. The model is then refined through an active learning loop that strategically queries a simulated noisy oracle for labels. Experiments conducted on various socio-economic datasets, including those related to financial credibility, career success rate, and socio-economic status, consistently show that the PCA-driven approach outperforms standard active learning strategies that start without prior information. This work thus provides a computationally efficient and straightforward solution that effectively addresses the cold-start problem.
