Positive unlabeled learning for building recommender systems in a parliamentary setting
Luis M. de Camposa, Juan M. Fernández-Luna, Juan F. Huete, Luis Redondo-Expósito
TL;DR
This paper tackles automatic filtering of parliamentary documents to Members of Parliament by learning MPs interests from their debate interventions. It introduces a two-step Positive Unlabeled Learning framework with a novel modified K-means step to extract reliable negatives and a subsequent SVM classifier for each MP. Empirical evaluation on Andalusian Parliament data shows pul-km outperforms bas, pul-nb, and IR-based approaches, especially as more intervention data become available. The work demonstrates the viability of PUL for targeted political document distribution and suggests directions for balancing strategies and feature selection in future work.
Abstract
Our goal is to learn about the political interests and preferences of the Members of Parliament by mining their parliamentary activity, in order to develop a recommendation/filtering system that, given a stream of documents to be distributed among them, is able to decide which documents should receive each Member of Parliament. We propose to use positive unlabeled learning to tackle this problem, because we only have information about relevant documents (the own interventions of each Member of Parliament in the debates) but not about irrelevant documents, so that we cannot use standard binary classifiers trained with positive and negative examples. We have also developed a new algorithm of this type, which compares favourably with: a) the baseline approach assuming that all the interventions of other Members of Parliament are irrelevant, b) another well-known positive unlabeled learning method and c) an approach based on information retrieval methods that matches documents and legislators' representations. The experiments have been carried out with data from the regional Andalusian Parliament at Spain.
