Enhancing Prediction Models with Reinforcement Learning
Karol Radziszewski, Piotr Ociepka
TL;DR
This work presents Aureus, a production-scale news recommender for Onet.pl that blends reinforcement learning with deep learning and similarity-based models to improve online KPIs while addressing cold-start and content freshness. The approach uses multi-armed bandits (with UCB and Thompson Sampling) within user segments to capture popularity signals, complemented by a deep user-interest model and a cosine-similarity based predictor; the two are combined via a weighted average mixer to yield superior online performance and acceptable latency. Offline evaluations demonstrate the deep model’s superiority over a similarity baseline, and online AB tests show the ensemble outperforms individual components, highlighting synergy between popularity signals and personalization. The results support deploying a hybrid architecture in production to adapt to rapid content changes and evolving user preferences, with future work exploring more features and embedding models to further refine recommendations.
Abstract
We present a large-scale news recommendation system implemented at Ringier Axel Springer Polska, focusing on enhancing prediction models with reinforcement learning techniques. The system, named Aureus, integrates a variety of algorithms, including multi-armed bandit methods and deep learning models based on large language models (LLMs). We detail the architecture and implementation of Aureus, emphasizing the significant improvements in online metrics achieved by combining ranking prediction models with reinforcement learning. The paper further explores the impact of different models mixing on key business performance indicators. Our approach effectively balances the need for personalized recommendations with the ability to adapt to rapidly changing news content, addressing common challenges such as the cold start problem and content freshness. The results of online evaluation demonstrate the effectiveness of the proposed system in a real-world production environment.
