Table of Contents
Fetching ...

TEARS: Textual Representations for Scrutable Recommendations

Emiliano Penaloza, Olivier Gouvert, Haolun Wu, Laurent Charlin

TL;DR

This work tackles the opacity and lack of user control in traditional latent-user representations by proposing TEARS, which encodes user preferences as natural-text summaries generated by a modern LLM. TEARS aligns these text-based representations with a standard VAE for collaborative filtering through an optimal-transport objective, and blends them with the learned latent space via a mixing coefficient $\alpha$ to balance transparency and performance. The approach yields high-quality recommendations while enabling user edits to directly steer rankings, demonstrated across MovieLens-1M, Netflix, and Goodbooks with robust controllability under large-scale and fine-grained edits and guided prompts. The work introduces a practical, scrutable, and controllable framework for recommender systems, with significant implications for user autonomy and transparency in personalized content.

Abstract

Traditional recommender systems rely on high-dimensional (latent) embeddings for modeling user-item interactions, often resulting in opaque representations that lack interpretability. Moreover, these systems offer limited control to users over their recommendations. Inspired by recent work, we introduce TExtuAl Representations for Scrutable recommendations (TEARS) to address these challenges. Instead of representing a user's interests through a latent embedding, TEARS encodes them in natural text, providing transparency and allowing users to edit them. To do so, TEARS uses a modern LLM to generate user summaries based on user preferences. We find the summaries capture user preferences uniquely. Using these summaries, we take a hybrid approach where we use an optimal transport procedure to align the summaries' representation with the learned representation of a standard VAE for collaborative filtering. We find this approach can surpass the performance of three popular VAE models while providing user-controllable recommendations. We also analyze the controllability of TEARS through three simulated user tasks to evaluate the effectiveness of a user editing its summary.

TEARS: Textual Representations for Scrutable Recommendations

TL;DR

This work tackles the opacity and lack of user control in traditional latent-user representations by proposing TEARS, which encodes user preferences as natural-text summaries generated by a modern LLM. TEARS aligns these text-based representations with a standard VAE for collaborative filtering through an optimal-transport objective, and blends them with the learned latent space via a mixing coefficient to balance transparency and performance. The approach yields high-quality recommendations while enabling user edits to directly steer rankings, demonstrated across MovieLens-1M, Netflix, and Goodbooks with robust controllability under large-scale and fine-grained edits and guided prompts. The work introduces a practical, scrutable, and controllable framework for recommender systems, with significant implications for user autonomy and transparency in personalized content.

Abstract

Traditional recommender systems rely on high-dimensional (latent) embeddings for modeling user-item interactions, often resulting in opaque representations that lack interpretability. Moreover, these systems offer limited control to users over their recommendations. Inspired by recent work, we introduce TExtuAl Representations for Scrutable recommendations (TEARS) to address these challenges. Instead of representing a user's interests through a latent embedding, TEARS encodes them in natural text, providing transparency and allowing users to edit them. To do so, TEARS uses a modern LLM to generate user summaries based on user preferences. We find the summaries capture user preferences uniquely. Using these summaries, we take a hybrid approach where we use an optimal transport procedure to align the summaries' representation with the learned representation of a standard VAE for collaborative filtering. We find this approach can surpass the performance of three popular VAE models while providing user-controllable recommendations. We also analyze the controllability of TEARS through three simulated user tasks to evaluate the effectiveness of a user editing its summary.

Paper Structure

This paper contains 60 sections, 14 equations, 14 figures, 16 tables.

Figures (14)

  • Figure 1: General scrutable recommendations framework proposed by scrutableRecsys. Our work implements this framework and provides benchmarks and specific instantiation to obtain controllable, high-quality recommendations.
  • Figure 2: We illustrate the general TEARS. TEARS produces recommendations based on a convex combination of aligned summary and black-box representations, allowing users to interpolate between transparent text-based recommendations and black-box methods. All figures in blue indicate frozen weights, while red indicates a trainable procedure.
  • Figure 3: Controllability experiment visualization: large-scope changes (top/grey frame), fine-grained edits (middle/orange frame), and guided recommendations (bottom/ purple frame). Red indicates edited summaries, green are base summaries and blue are models. Summaries and examples are paraphrased. App. \ref{['app:example-summaries']} includes more summaries, with examples in App. \ref{['app-large-scope']} for large-scope and App. \ref{['app:fine-grained']} for fine-grained.
  • Figure 4: Tradeoff between recommendation performance (y-axis) and large scope controllability measured by $\Delta@k(\rho) = \text{NDCG}^{\text{Original}}_{\text{genre}}@k(\rho) -\text{NDCG}^{\text{Augmented}}_{\text{genre}}@k(\rho)$ (x-axis) for ML-1M (left) and Goodbooks (middle) using GPT-generated summaries (see App. F for LLaMA results). Netflix data is shown in \ref{['fig:netflix-plot-gpt']}. The x-axis represents $|\Delta_{\text{up/down}}|$ as $\alpha$ increases, reflecting its impact on NDCG@20. Notably, most VAE models outperform BASE in recommendation performance at $\alpha=1$, with some also achieving superior controllability. The bar plots (right) illustrate guided recommendation outcomes, where all models successfully guide black-box embeddings in the intended direction. Results are averaged across five seeds. All details are in App. \ref{['app-controllability-breakdwon']}.
  • Figure 5: $\delta_{\text{rank}}$ after fine-grained changes. Target items can gain tens of positions from small edits in user summaries. Here error bars represent the standard error $\frac{\sigma}{\sqrt{n}}$.
  • ...and 9 more figures