Table of Contents
Fetching ...

Online Learning for Recommendations at Grubhub

Alex Egg

TL;DR

This work tackles the challenge of keeping large-scale Grubhub recommender systems fresh and cost-efficient in production by enabling online incremental learning through transfer learning. It combines stateful online updates, pre-training with offline data, and hash-based embedding schemes to handle non-stationary item categories, aiming to balance drift responsiveness with computational efficiency. The authors report a +20% PTR increase and a 45x reduction in cloud costs in AB tests, highlighting the practical impact of online, stateful learning for large-scale e-commerce recommendations. Overall, the paper demonstrates that offline-to-online transitions, when carefully implemented with pre-training and hashing strategies, can deliver rapid adaptation and substantial cost savings in production recommender systems.

Abstract

We propose a method to easily modify existing offline Recommender Systems to run online using Transfer Learning. Online Learning for Recommender Systems has two main advantages: quality and scale. Like many Machine Learning algorithms in production if not regularly retrained will suffer from Concept Drift. A policy that is updated frequently online can adapt to drift faster than a batch system. This is especially true for user-interaction systems like recommenders where the underlying distribution can shift drastically to follow user behaviour. As a platform grows rapidly like Grubhub, the cost of running batch training jobs becomes material. A shift from stateless batch learning offline to stateful incremental learning online can recover, for example, at Grubhub, up to a 45x cost savings and a +20% metrics increase. There are a few challenges to overcome with the transition to online stateful learning, namely convergence, non-stationary embeddings and off-policy evaluation, which we explore from our experiences running this system in production.

Online Learning for Recommendations at Grubhub

TL;DR

This work tackles the challenge of keeping large-scale Grubhub recommender systems fresh and cost-efficient in production by enabling online incremental learning through transfer learning. It combines stateful online updates, pre-training with offline data, and hash-based embedding schemes to handle non-stationary item categories, aiming to balance drift responsiveness with computational efficiency. The authors report a +20% PTR increase and a 45x reduction in cloud costs in AB tests, highlighting the practical impact of online, stateful learning for large-scale e-commerce recommendations. Overall, the paper demonstrates that offline-to-online transitions, when carefully implemented with pre-training and hashing strategies, can deliver rapid adaptation and substantial cost savings in production recommender systems.

Abstract

We propose a method to easily modify existing offline Recommender Systems to run online using Transfer Learning. Online Learning for Recommender Systems has two main advantages: quality and scale. Like many Machine Learning algorithms in production if not regularly retrained will suffer from Concept Drift. A policy that is updated frequently online can adapt to drift faster than a batch system. This is especially true for user-interaction systems like recommenders where the underlying distribution can shift drastically to follow user behaviour. As a platform grows rapidly like Grubhub, the cost of running batch training jobs becomes material. A shift from stateless batch learning offline to stateful incremental learning online can recover, for example, at Grubhub, up to a 45x cost savings and a +20% metrics increase. There are a few challenges to overcome with the transition to online stateful learning, namely convergence, non-stationary embeddings and off-policy evaluation, which we explore from our experiences running this system in production.

Paper Structure

This paper contains 7 sections, 4 figures, 1 table.

Figures (4)

  • Figure 1: \ref{['fig:stateless']} Example of daily Stateless updates with a 4-day sliding-window and next day hold-out for Cross Validation. \ref{['fig:stateful']} Example of daily incremental updates with state & Bootstrapping. Notice the model is only updating on new data, not the sliding-window in Fig \ref{['fig:stateless']}. The cadence can be 1-day, as in this example, or as small as a 10 minutes as in 10.1145/3383313.3412214
  • Figure 2: \ref{['fig:batch']} Batch Cross Validation results on incrementally increasing training window from 1 to 80 days. \ref{['fig:bonline']} Incremental updates for 80 days. Converges faster than batch updates in Figure \ref{['fig:batch']}
  • Figure 3: Collision Rate vs Buckets.
  • Figure 4: Experiment showing relative increase in PTR for different update schemes.