Table of Contents
Fetching ...

Cold-Start Active Preference Learning in Socio-Economic Domains

Mojtaba Fayaz-Bakhsh, Danial Ataee, MohammadAmin Fazli

TL;DR

This work tackles the cold-start problem in active preference learning by introducing a PCA-driven warm-up that generates initial pseudo-labels without expert input, producing a warmed-up model before active querying. It integrates a simulated noisy Bradley–Terry oracle within a warm-start active-learning loop, using XGBoost for the underlying binary preference model. Across multiple socio-economic datasets, the PCA-based warm-up consistently improves early performance and reduces the labeled data required compared to standard cold-start baselines, demonstrating strong practical gains in low-data regimes. The approach is computationally efficient, scalable, and readily applicable to real-world socio-economic preference modeling, with future work exploring alternative unsupervised initializations, enhanced query strategies, and real human-in-the-loop validation.

Abstract

Active preference learning offers an efficient approach to modeling preferences, but it is hindered by the cold-start problem, which leads to a marked decline in performance when no initial labeled data are available. While cold-start solutions have been proposed for domains such as vision and text, the cold-start problem in active preference learning remains largely unexplored, underscoring the need for practical, effective methods. Drawing inspiration from established practices in social and economic research, the proposed method initiates learning with a self-supervised phase that employs Principal Component Analysis (PCA) to generate initial pseudo-labels. This process produces a \say{warmed-up} model based solely on the data's intrinsic structure, without requiring expert input. The model is then refined through an active learning loop that strategically queries a simulated noisy oracle for labels. Experiments conducted on various socio-economic datasets, including those related to financial credibility, career success rate, and socio-economic status, consistently show that the PCA-driven approach outperforms standard active learning strategies that start without prior information. This work thus provides a computationally efficient and straightforward solution that effectively addresses the cold-start problem.

Cold-Start Active Preference Learning in Socio-Economic Domains

TL;DR

This work tackles the cold-start problem in active preference learning by introducing a PCA-driven warm-up that generates initial pseudo-labels without expert input, producing a warmed-up model before active querying. It integrates a simulated noisy Bradley–Terry oracle within a warm-start active-learning loop, using XGBoost for the underlying binary preference model. Across multiple socio-economic datasets, the PCA-based warm-up consistently improves early performance and reduces the labeled data required compared to standard cold-start baselines, demonstrating strong practical gains in low-data regimes. The approach is computationally efficient, scalable, and readily applicable to real-world socio-economic preference modeling, with future work exploring alternative unsupervised initializations, enhanced query strategies, and real human-in-the-loop validation.

Abstract

Active preference learning offers an efficient approach to modeling preferences, but it is hindered by the cold-start problem, which leads to a marked decline in performance when no initial labeled data are available. While cold-start solutions have been proposed for domains such as vision and text, the cold-start problem in active preference learning remains largely unexplored, underscoring the need for practical, effective methods. Drawing inspiration from established practices in social and economic research, the proposed method initiates learning with a self-supervised phase that employs Principal Component Analysis (PCA) to generate initial pseudo-labels. This process produces a \say{warmed-up} model based solely on the data's intrinsic structure, without requiring expert input. The model is then refined through an active learning loop that strategically queries a simulated noisy oracle for labels. Experiments conducted on various socio-economic datasets, including those related to financial credibility, career success rate, and socio-economic status, consistently show that the PCA-driven approach outperforms standard active learning strategies that start without prior information. This work thus provides a computationally efficient and straightforward solution that effectively addresses the cold-start problem.

Paper Structure

This paper contains 20 sections, 11 equations, 2 figures, 1 table.

Figures (2)

  • Figure 1: Conceptual overview of the cold-start active preference learning framework. All the elements and operations are described in detail within Section \ref{['sec:meth']}
  • Figure 2: Comparative performance across different datasets (Averaged over 40 runs).