LDA-based Term Profiles for Expert Finding in a Political Setting

Luis M. de Campos; Juan M. Fernández-Luna; Juan F. Huete; Luis Redondo-Expósito

LDA-based Term Profiles for Expert Finding in a Political Setting

Luis M. de Campos, Juan M. Fernández-Luna, Juan F. Huete, Luis Redondo-Expósito

TL;DR

This work tackles political expert finding by learning MPs' areas of expertise from parliamentary interventions and representing each candidate with multiple homogeneous term subprofiles derived via latent Dirichlet allocation (LDA). It introduces a method to split documents into topic-aligned subdocuments by distributing term occurrences according to $p(x|t,d)$ and then merges these into per-candidate subprofiles; it also proposes 15 distribution measures that collapse into five strategies (Euclidean, Cosine, Dice, Sorensen, Overlap) to select the number of subdocuments. Experiments on Andalusian Parliament records show that, with an appropriate topic count $k$ and distribution strategy (notably Sorensen with $k=\sqrt{n/2}$), the LDA-based subprofiles improve retrieval quality (e.g., $NDCG@10$ and $Precision@10$) over strong baselines and deep learning approaches. The results highlight practical guidance for building topic-based expert profiles from political texts and point to future extensions such as temporal LDA and paragraph-level term distribution for broader applicability. Overall, the paper demonstrates that generating homogeneous, topic-aligned term subprofiles via LDA can enhance expert finding in political domains and offers concrete methods and evaluation to support Deployment.

Abstract

A common task in many political institutions (i.e. Parliament) is to find politicians who are experts in a particular field. In order to tackle this problem, the first step is to obtain politician profiles which include their interests, and these can be automatically learned from their speeches. As a politician may have various areas of expertise, one alternative is to use a set of subprofiles, each of which covers a different subject. In this study, we propose a novel approach for this task by using latent Dirichlet allocation (LDA) to determine the main underlying topics of each political speech, and to distribute the related terms among the different topic-based subprofiles. With this objective, we propose the use of fifteen distance and similarity measures to automatically determine the optimal number of topics discussed in a document, and to demonstrate that every measure converges into five strategies: Euclidean, Dice, Sorensen, Cosine and Overlap. Our experimental results showed that the scores of the different accuracy metrics of the proposed strategies tended to be higher than those of the baselines for expert recommendation tasks, and that the use of an appropriate number of topics has proved relevant.

LDA-based Term Profiles for Expert Finding in a Political Setting

TL;DR

and then merges these into per-candidate subprofiles; it also proposes 15 distribution measures that collapse into five strategies (Euclidean, Cosine, Dice, Sorensen, Overlap) to select the number of subdocuments. Experiments on Andalusian Parliament records show that, with an appropriate topic count

and distribution strategy (notably Sorensen with

), the LDA-based subprofiles improve retrieval quality (e.g.,

and

) over strong baselines and deep learning approaches. The results highlight practical guidance for building topic-based expert profiles from political texts and point to future extensions such as temporal LDA and paragraph-level term distribution for broader applicability. Overall, the paper demonstrates that generating homogeneous, topic-aligned term subprofiles via LDA can enhance expert finding in political domains and offers concrete methods and evaluation to support Deployment.

Abstract

Paper Structure (25 sections, 43 equations, 4 figures, 4 tables)

This paper contains 25 sections, 43 equations, 4 figures, 4 tables.

Introduction
Related work
Basic models of expert finding
Expert finding using topic models
Other models of expert finding
Using LDA to obtain homogeneous subprofiles
Separating documents into homogeneous subdocuments
Merging subdocuments to obtain homogeneous subprofiles
Selecting the optimal number of subdocuments: distribution strategies
Building the optimal number of subdocuments
Experimental settings
Data sets and evaluation methology
LDA implementation
Training set, profile generation and retrieval system
Query formulation and relevance assessments
...and 10 more sections

Figures (4)

Figure 1: Algorithm to generate subdocuments from the probabilities $p(x|t,d)$
Figure 2: Mean number of subdocuments generated from each of the documents associated to each MP, using the five methods ($k=70$)
Figure 3: Global process
Figure 4: Normalized entropy of the distribution of MPs versus the distribution of topics considering $k=\sqrt{n/2}$

LDA-based Term Profiles for Expert Finding in a Political Setting

TL;DR

Abstract

LDA-based Term Profiles for Expert Finding in a Political Setting

Authors

TL;DR

Abstract

Table of Contents

Figures (4)