Table of Contents
Fetching ...

Pseudo Label NCF for Sparse OHC Recommendation: Dual Representation Learning and the Separability Accuracy Trade off

Pronob Kumar Barman, Tera L. Reynolds, James Foulds

Abstract

Online Health Communities connect patients for peer support, but users face a discovery challenge when they have minimal prior interactions to guide personalization. We study recommendation under extreme interaction sparsity in a survey driven setting where each user provides a 16 dimensional intake vector and each support group has a structured feature profile. We extend Neural Collaborative Filtering architectures, including Matrix Factorization, Multi Layer Perceptron, and NeuMF, with an auxiliary pseudo label objective derived from survey group feature alignment using cosine similarity mapped to [0, 1]. The resulting Pseudo Label NCF learns dual embedding spaces: main embeddings for ranking and pseudo label embeddings for semantic alignment. We evaluate on a dataset of 165 users and 498 support groups using a leave one out protocol that reflects cold start conditions. All pseudo label variants improve ranking performance: MLP improves HR@5 from 2.65% to 5.30%, NeuMF from 4.46% to 5.18%, and MF from 4.58% to 5.42%. Pseudo label embedding spaces also show higher cosine silhouette scores than baseline embeddings, with MF improving from 0.0394 to 0.0684 and NeuMF from 0.0263 to 0.0653. We further observe a negative correlation between embedding separability and ranking accuracy, indicating a trade off between interpretability and performance. These results show that survey derived pseudo labels improve recommendation under extreme sparsity while producing interpretable task specific embedding spaces.

Pseudo Label NCF for Sparse OHC Recommendation: Dual Representation Learning and the Separability Accuracy Trade off

Abstract

Online Health Communities connect patients for peer support, but users face a discovery challenge when they have minimal prior interactions to guide personalization. We study recommendation under extreme interaction sparsity in a survey driven setting where each user provides a 16 dimensional intake vector and each support group has a structured feature profile. We extend Neural Collaborative Filtering architectures, including Matrix Factorization, Multi Layer Perceptron, and NeuMF, with an auxiliary pseudo label objective derived from survey group feature alignment using cosine similarity mapped to [0, 1]. The resulting Pseudo Label NCF learns dual embedding spaces: main embeddings for ranking and pseudo label embeddings for semantic alignment. We evaluate on a dataset of 165 users and 498 support groups using a leave one out protocol that reflects cold start conditions. All pseudo label variants improve ranking performance: MLP improves HR@5 from 2.65% to 5.30%, NeuMF from 4.46% to 5.18%, and MF from 4.58% to 5.42%. Pseudo label embedding spaces also show higher cosine silhouette scores than baseline embeddings, with MF improving from 0.0394 to 0.0684 and NeuMF from 0.0263 to 0.0653. We further observe a negative correlation between embedding separability and ranking accuracy, indicating a trade off between interpretability and performance. These results show that survey derived pseudo labels improve recommendation under extreme sparsity while producing interpretable task specific embedding spaces.

Paper Structure

This paper contains 49 sections, 5 equations, 3 figures, 3 tables.

Figures (3)

  • Figure 1: Overview of the PL-NCF dual-representation architecture. Each user $u$ and group $g$ maintain separate main embeddings ($\mathbf{p}_u$, $\mathbf{q}_g$) for the ranking pathway and PL-specific embeddings ($\mathbf{p}_u^{\text{PL}}$, $\mathbf{q}_g^{\text{PL}}$) for the alignment pathway. The ranking pathway processes main embeddings through NCF layers, while the alignment pathway computes cosine similarity $a_{ug}^{\text{PL}}$ between PL-specific embeddings. Both pathways are fused and supervised jointly by binary cross-entropy $\mathcal{L}_{\text{BCE}}$ and pseudo-label loss $\mathcal{L}_{\text{PL}}$.
  • Figure 2: t-SNE visualization of user embeddings under leave-one-out evaluation. Left: NeuMF baseline main embeddings. Right: NeuMF-PL PL-specific embeddings. Cluster labels are computed via spherical $k$-means in the original embedding space and overlaid on the 2D coordinates for visualization only.
  • Figure 3: Additional t-SNE visualizations under leave-one-out comparing baseline main embeddings (left) to PL-specific embeddings (right) for MF and MLP. Cluster labels are computed via spherical $k$-means in the original embedding space and overlaid on 2D projections for visualization only.