Solving cold start in news recommendations: a RippleNet-based system for large scale media outlet
Karol Radziszewski, Michał Szpunar, Piotr Ociepka, Mateusz Buczyński
TL;DR
This work targets the persistent cold-start problem in news recommendations by augmenting RippleNet with semantic embeddings from large language models to better represent newly published items. It contributes a production-oriented pipeline deployed on SageMaker with Airflow-driven data flows and a richly described golden dataset in the Polish news domain. Offline and online evaluations show that while the RippleNet+LLM hybrid captures semantic relationships, it does not yet outperform a production baseline in real-world deployment, and online results reveal negative engagement effects. The study demonstrates the potential of knowledge-graph–driven approaches for rapidly changing content while outlining clear directions for improving generalization and production readiness.
Abstract
We present a scalable recommender system implementation based on RippleNet, tailored for the media domain with a production deployment in Onet.pl, one of Poland's largest online media platforms. Our solution addresses the cold-start problem for newly published content by integrating content-based item embeddings into the knowledge propagation mechanism of RippleNet, enabling effective scoring of previously unseen items. The system architecture leverages Amazon SageMaker for distributed training and inference, and Apache Airflow for orchestrating data pipelines and model retraining workflows. To ensure high-quality training data, we constructed a comprehensive golden dataset consisting of user and item features and a separate interaction table, all enabling flexible extensions and integration of new signals.
