ERCache: An Efficient and Reliable Caching Framework for Large-Scale User Representations in Meta's Ads System

Fang Zhou; Yaning Huang; Dong Liang; Dai Li; Zhongke Zhang; Kai Wang; Xiao Xin; Abdallah Aboelela; Zheliang Jiang; Yang Wang; Jeff Song; Wei Zhang; Chen Liang; Huayu Li; ChongLin Sun; Hang Yang; Lei Qu; Zhan Shu; Mindi Yuan; Emanuele Maccherani; Taha Hayat; John Guo; Varna Puvvada; Uladzimir Pashkevich

ERCache: An Efficient and Reliable Caching Framework for Large-Scale User Representations in Meta's Ads System

Fang Zhou, Yaning Huang, Dong Liang, Dai Li, Zhongke Zhang, Kai Wang, Xiao Xin, Abdallah Aboelela, Zheliang Jiang, Yang Wang, Jeff Song, Wei Zhang, Chen Liang, Huayu Li, ChongLin Sun, Hang Yang, Lei Qu, Zhan Shu, Mindi Yuan, Emanuele Maccherani, Taha Hayat, John Guo, Varna Puvvada, Uladzimir Pashkevich

TL;DR

This work addresses the inefficiency of performing user-model inferences for every ad request in large-scale social networks. It introduces ERCache, a two-layer caching framework with a direct cache for serving and a failover cache for recovery, featuring TTL-based eviction, update grouping, asynchronous writes, and regional consistency to balance embedding freshness, model complexity, and service SLAs. Deployed at Meta for over six months across more than 30 ranking models, ERCache achieves substantial computational resource savings with minimal or no degradation in end-to-end latency and a large reduction in fallback events. The findings highlight how exploiting short-term user access patterns through caching can maintain performance while significantly reducing compute, with practical guidance for deploying caching in large-scale ad systems.

Abstract

The increasing complexity of deep learning models used for calculating user representations presents significant challenges, particularly with limited computational resources and strict service-level agreements (SLAs). Previous research efforts have focused on optimizing model inference but have overlooked a critical question: is it necessary to perform user model inference for every ad request in large-scale social networks? To address this question and these challenges, we first analyze user access patterns at Meta and find that most user model inferences occur within a short timeframe. T his observation reveals a triangular relationship among model complexity, embedding freshness, and service SLAs. Building on this insight, we designed, implemented, and evaluated ERCache, an efficient and robust caching framework for large-scale user representations in ads recommendation systems on social networks. ERCache categorizes cache into direct and failover types and applies customized settings and eviction policies for each model, effectively balancing model complexity, embedding freshness, and service SLAs, even considering the staleness introduced by caching. ERCache has been deployed at Meta for over six months, supporting more than 30 ranking models while efficiently conserving computational resources and complying with service SLA requirements.

ERCache: An Efficient and Reliable Caching Framework for Large-Scale User Representations in Meta's Ads System

TL;DR

Abstract

ERCache: An Efficient and Reliable Caching Framework for Large-Scale User Representations in Meta's Ads System

Authors

TL;DR

Abstract

Table of Contents

Figures (10)