Table of Contents
Fetching ...

Enhancing Prediction Models with Reinforcement Learning

Karol Radziszewski, Piotr Ociepka

TL;DR

This work presents Aureus, a production-scale news recommender for Onet.pl that blends reinforcement learning with deep learning and similarity-based models to improve online KPIs while addressing cold-start and content freshness. The approach uses multi-armed bandits (with UCB and Thompson Sampling) within user segments to capture popularity signals, complemented by a deep user-interest model and a cosine-similarity based predictor; the two are combined via a weighted average mixer to yield superior online performance and acceptable latency. Offline evaluations demonstrate the deep model’s superiority over a similarity baseline, and online AB tests show the ensemble outperforms individual components, highlighting synergy between popularity signals and personalization. The results support deploying a hybrid architecture in production to adapt to rapid content changes and evolving user preferences, with future work exploring more features and embedding models to further refine recommendations.

Abstract

We present a large-scale news recommendation system implemented at Ringier Axel Springer Polska, focusing on enhancing prediction models with reinforcement learning techniques. The system, named Aureus, integrates a variety of algorithms, including multi-armed bandit methods and deep learning models based on large language models (LLMs). We detail the architecture and implementation of Aureus, emphasizing the significant improvements in online metrics achieved by combining ranking prediction models with reinforcement learning. The paper further explores the impact of different models mixing on key business performance indicators. Our approach effectively balances the need for personalized recommendations with the ability to adapt to rapidly changing news content, addressing common challenges such as the cold start problem and content freshness. The results of online evaluation demonstrate the effectiveness of the proposed system in a real-world production environment.

Enhancing Prediction Models with Reinforcement Learning

TL;DR

This work presents Aureus, a production-scale news recommender for Onet.pl that blends reinforcement learning with deep learning and similarity-based models to improve online KPIs while addressing cold-start and content freshness. The approach uses multi-armed bandits (with UCB and Thompson Sampling) within user segments to capture popularity signals, complemented by a deep user-interest model and a cosine-similarity based predictor; the two are combined via a weighted average mixer to yield superior online performance and acceptable latency. Offline evaluations demonstrate the deep model’s superiority over a similarity baseline, and online AB tests show the ensemble outperforms individual components, highlighting synergy between popularity signals and personalization. The results support deploying a hybrid architecture in production to adapt to rapid content changes and evolving user preferences, with future work exploring more features and embedding models to further refine recommendations.

Abstract

We present a large-scale news recommendation system implemented at Ringier Axel Springer Polska, focusing on enhancing prediction models with reinforcement learning techniques. The system, named Aureus, integrates a variety of algorithms, including multi-armed bandit methods and deep learning models based on large language models (LLMs). We detail the architecture and implementation of Aureus, emphasizing the significant improvements in online metrics achieved by combining ranking prediction models with reinforcement learning. The paper further explores the impact of different models mixing on key business performance indicators. Our approach effectively balances the need for personalized recommendations with the ability to adapt to rapidly changing news content, addressing common challenges such as the cold start problem and content freshness. The results of online evaluation demonstrate the effectiveness of the proposed system in a real-world production environment.

Paper Structure

This paper contains 18 sections, 3 figures, 2 tables.

Figures (3)

  • Figure 1: The diagram of the segments calculation process.
  • Figure 2: The diagram of the deep model architecture. Input embeddings are calculated with pretrained models.
  • Figure 3: The diagram of the Aureus recommendation system illustrates the following components: Inputs consist of the user ID, a set of content items, and online business KPI metrics. The system integrates two submodels: a deep learning-based user interest model and a multi-armed bandit content popularity model. These submodels are combined using a specified combination strategy.