Table of Contents
Fetching ...

Data Efficiency for Large Recommendation Models

Kshitij Jain, Jingru Xie, Kevin Regan, Cheng Chen, Jie Han, Steve Li, Zhuoshu Li, Todd Phillips, Myles Sussman, Matt Troup, Angel Yu, Jia Zhuo

TL;DR

The concept of data convergence is outlined, methods to accelerate this convergence are described, and methods to accelerate this convergence are described, to detail how to optimally balance training data volume with model size.

Abstract

Large recommendation models (LRMs) are fundamental to the multi-billion dollar online advertising industry, processing massive datasets of hundreds of billions of examples before transitioning to continuous online training to adapt to rapidly changing user behavior. The massive scale of data directly impacts both computational costs and the speed at which new methods can be evaluated (R&D velocity). This paper presents actionable principles and high-level frameworks to guide practitioners in optimizing training data requirements. These strategies have been successfully deployed in Google's largest Ads CTR prediction models and are broadly applicable beyond LRMs. We outline the concept of data convergence, describe methods to accelerate this convergence, and finally, detail how to optimally balance training data volume with model size.

Data Efficiency for Large Recommendation Models

TL;DR

The concept of data convergence is outlined, methods to accelerate this convergence are described, and methods to accelerate this convergence are described, to detail how to optimally balance training data volume with model size.

Abstract

Large recommendation models (LRMs) are fundamental to the multi-billion dollar online advertising industry, processing massive datasets of hundreds of billions of examples before transitioning to continuous online training to adapt to rapidly changing user behavior. The massive scale of data directly impacts both computational costs and the speed at which new methods can be evaluated (R&D velocity). This paper presents actionable principles and high-level frameworks to guide practitioners in optimizing training data requirements. These strategies have been successfully deployed in Google's largest Ads CTR prediction models and are broadly applicable beyond LRMs. We outline the concept of data convergence, describe methods to accelerate this convergence, and finally, detail how to optimally balance training data volume with model size.

Paper Structure

This paper contains 7 sections, 4 figures.

Figures (4)

  • Figure 1: Model's accuracy as dataset grows highlighting model's convergence point w.r.t data
  • Figure 2: Moving the model's convergence point using continuous sampling.
  • Figure 3: Continuous distillation accelerates model convergence.
  • Figure 4: IsoCompute Curve For a fixed training budget of 4K TPU hours, we varied the sampling rate and model size to achieve optimal quality