Table of Contents
Fetching ...

Leveraging Topic Specificity and Social Relationships for Expert Finding in Community Question Answering Platforms

Maddalena Amendola, Andrea Passarella, Raffaele Perego

TL;DR

This work tackles expert finding in community Q&A by integrating topic-specific content signals and social relationships through a multi-layer topic graph. The proposed TUEF framework identifies and explores candidate experts per topic layer, then ranks them with a learning-to-rank model that combines static and query-dependent features. Across six Stack Exchange communities, TUEF consistently outperforms state-of-the-art baselines in both end-to-end Expert Ranking and Expert Subsample Ranking, with notable gains in P@1, NDCG@3, R@5, and MRR, and demonstrates scalable performance on larger datasets. The approach also incorporates an interpretable LtR variant (IlMart), providing explanations with modest or negligible trade-offs in accuracy, enhancing transparency and trust in EF decisions.

Abstract

Online Community Question Answering (CQA) platforms have become indispensable tools for users seeking expert solutions to their technical queries. The effectiveness of these platforms relies on their ability to identify and direct questions to the most knowledgeable users within the community, a process known as Expert Finding (EF). EF accuracy is crucial for increasing user engagement and the reliability of provided answers. Despite recent advancements in EF methodologies, blending the diverse information sources available on CQA platforms for effective expert identification remains challenging. In this paper, we present TUEF, a Topic-oriented User-Interaction model for Expert Finding, which aims to fully and transparently leverage the heterogeneous information available within online question-answering communities. TUEF integrates content and social data by constructing a multi-layer graph that maps out user relationships based on their answering patterns on specific topics. By combining these sources of information, TUEF identifies the most relevant and knowledgeable users for any given question and ranks them using learning-to-rank techniques. Our findings indicate that TUEF's topic-oriented model significantly enhances performance, particularly in large communities discussing well-defined topics. Additionally, we show that the interpretable learning-to-rank algorithm integrated into TUEF offers transparency and explainability with minimal performance trade-offs. The exhaustive experiments conducted on six different CQA communities of Stack Exchange show that TUEF outperforms all competitors with a minimum performance boost of 42.42% in P@1, 32.73% in NDCG@3, 21.76% in R@5, and 29.81% in MRR, excelling in both the evaluation approaches present in the previous literature.

Leveraging Topic Specificity and Social Relationships for Expert Finding in Community Question Answering Platforms

TL;DR

This work tackles expert finding in community Q&A by integrating topic-specific content signals and social relationships through a multi-layer topic graph. The proposed TUEF framework identifies and explores candidate experts per topic layer, then ranks them with a learning-to-rank model that combines static and query-dependent features. Across six Stack Exchange communities, TUEF consistently outperforms state-of-the-art baselines in both end-to-end Expert Ranking and Expert Subsample Ranking, with notable gains in P@1, NDCG@3, R@5, and MRR, and demonstrates scalable performance on larger datasets. The approach also incorporates an interpretable LtR variant (IlMart), providing explanations with modest or negligible trade-offs in accuracy, enhancing transparency and trust in EF decisions.

Abstract

Online Community Question Answering (CQA) platforms have become indispensable tools for users seeking expert solutions to their technical queries. The effectiveness of these platforms relies on their ability to identify and direct questions to the most knowledgeable users within the community, a process known as Expert Finding (EF). EF accuracy is crucial for increasing user engagement and the reliability of provided answers. Despite recent advancements in EF methodologies, blending the diverse information sources available on CQA platforms for effective expert identification remains challenging. In this paper, we present TUEF, a Topic-oriented User-Interaction model for Expert Finding, which aims to fully and transparently leverage the heterogeneous information available within online question-answering communities. TUEF integrates content and social data by constructing a multi-layer graph that maps out user relationships based on their answering patterns on specific topics. By combining these sources of information, TUEF identifies the most relevant and knowledgeable users for any given question and ranks them using learning-to-rank techniques. Our findings indicate that TUEF's topic-oriented model significantly enhances performance, particularly in large communities discussing well-defined topics. Additionally, we show that the interpretable learning-to-rank algorithm integrated into TUEF offers transparency and explainability with minimal performance trade-offs. The exhaustive experiments conducted on six different CQA communities of Stack Exchange show that TUEF outperforms all competitors with a minimum performance boost of 42.42% in P@1, 32.73% in NDCG@3, 21.76% in R@5, and 29.81% in MRR, excelling in both the evaluation approaches present in the previous literature.
Paper Structure (35 sections, 6 equations, 4 figures, 6 tables, 1 algorithm)

This paper contains 35 sections, 6 equations, 4 figures, 6 tables, 1 algorithm.

Figures (4)

  • Figure 1: Illustration of the TUEF approach highlighting the distinct components. At inference time, TUEF first determines the main topics to which the question $q$ belongs and the corresponding graph layers (Multi-Layer Graph). Next, for each layer, it selects the candidate experts from two perspectives: i) Network, by identifying central users that may have considerable influence within the community; ii) Content, by identifying users who previously answered questions similar to $q$. The Multi-Layer Graph is used to collect candidate experts through Random Walks (Expert Selection). Following, TUEF extracts features based on text, tags, and graph relationships for each selected experts (Feature Extraction). Finally, TUEF uses a learned, precision-oriented model to score the candidates and rank them by expected relevance (Experts Ranking).
  • Figure 2: LtR Algorithm's Feature Importances. Figure (a) on the left displays the eight most important features for the TUEF LtR algorithm. Figure (b) illustrates the feature importance values for all features considered by TUEFIlMart.
  • Figure 3: The three most important main and interaction effects of TUEFIlMart on the StackOverflow dataset. Figures (a), (b), and (c) show the main effects of FreqIndexTag, FreqIndexText, and VisitCountContent, respectively. The x-axis represents the values the feature can have, while the y-axis represents the corresponding contribution of the feature to the predicted final score. Figure (d) illustrates the most important interaction effect learned by TUEFIlMart, composed of the features FreqIndexTag (x-axis) and FreqIndexText (y-axis). The color bar indicates the interaction contribution.
  • Figure 4: TUEF performance with four datasets corresponding to 1, 2, 3, and 4 months of StackOverflow data. The x-axis specifies the number of months considered, while the y-axis reports the performance metrics, including P@1, NDCG@3, R@5, MRR computed on the same test set of 1,342 queries. The rightmost plot indicates the per-query TUEF average inference time in seconds.