An Interpretable Alternative to Neural Representation Learning for Rating Prediction -- Transparent Latent Class Modeling of User Reviews
Giuseppe Serra, Peter Tino, Zhao Xu, Xin Yao
TL;DR
The paper tackles the need for interpretable rating prediction from textual reviews by introducing a Transparent Latent Class Model (TLCM) that organizes user and product latent classes on two 2D grids. Latent assignments are inferred with EM over a hierarchical latent structure that includes corrupted latent variables, enforcing a topographic prior via a neighborhood function $P( extbf{y}| extbf{z})$. An out-of-sample extension uses Bayes' rule to infer latent classes for unseen users from their reviews, and the inferred latent features feed a CNN-based rating predictor trained with Mean Squared Error. Empirical results on Amazon data demonstrate that TLCM yields competitive predictive performance while offering interpretable, visualization-friendly latent representations, addressing concerns about the interpretability gap in neural approaches. Overall, the work argues that principled, transparent models can closely approach, and in some settings match, the predictive power of text-based neural methods while providing clearer human-understandable explanations.
Abstract
Nowadays, neural network (NN) and deep learning (DL) techniques are widely adopted in many applications, including recommender systems. Given the sparse and stochastic nature of collaborative filtering (CF) data, recent works have critically analyzed the effective improvement of neural-based approaches compared to simpler and often transparent algorithms for recommendation. Previous results showed that NN and DL models can be outperformed by traditional algorithms in many tasks. Moreover, given the largely black-box nature of neural-based methods, interpretable results are not naturally obtained. Following on this debate, we first present a transparent probabilistic model that topologically organizes user and product latent classes based on the review information. In contrast to popular neural techniques for representation learning, we readily obtain a statistical, visualization-friendly tool that can be easily inspected to understand user and product characteristics from a textual-based perspective. Then, given the limitations of common embedding techniques, we investigate the possibility of using the estimated interpretable quantities as model input for a rating prediction task. To contribute to the recent debates, we evaluate our results in terms of both capacity for interpretability and predictive performances in comparison with popular text-based neural approaches. The results demonstrate that the proposed latent class representations can yield competitive predictive performances, compared to popular, but difficult-to-interpret approaches.
