Table of Contents
Fetching ...

PinRec: Outcome-Conditioned, Multi-Token Generative Retrieval for Industry-Scale Recommendation Systems

Prabhat Agarwal, Anirudhan Badrinath, Laksh Bhasin, Jaewon Yang, Edoardo Botta, Jiajing Xu, Charles Rosenberg

TL;DR

PinRec addresses the scalability and multi-objective demands of industry-scale generative retrieval by introducing outcome-conditioned generation and temporal multi-token inference within a Transformer-based framework. The approach enables targeted optimization across business metrics (e.g., saves, clicks) while producing diverse candidate sets through multi-token generation and efficient serving via CUDA graphs, KV caches, and ANN retrieval. Offline results show substantial gains over baselines, and online A/B tests demonstrate controllable improvements across Homefeed, Search, and Related Pins with modest latency costs. This work constitutes a practical, first comprehensive deployment of generative retrieval at web scale, with evidence of real-world impact and avenues for future enhancement in ranking integration and input-sequence optimization.

Abstract

Generative retrieval methods utilize generative sequential modeling techniques, such as transformers, to generate candidate items for recommender systems. These methods have demonstrated promising results in academic benchmarks, surpassing traditional retrieval models like two-tower architectures. However, current generative retrieval methods lack the scalability required for industrial recommender systems, and they are insufficiently flexible to satisfy the multiple metric requirements of modern systems. This paper introduces PinRec, a novel generative retrieval model developed for applications at Pinterest. PinRec utilizes outcome-conditioned generation, enabling modelers to specify how to balance various outcome metrics, such as the number of saves and clicks, to effectively align with business goals and user exploration. Additionally, PinRec incorporates multi-token generation to enhance output diversity while optimizing generation. Our experiments demonstrate that PinRec can successfully balance performance, diversity, and efficiency, delivering a significant positive impact to users using generative models. This paper marks a significant milestone in generative retrieval, as it presents, to our knowledge, the first rigorous study on implementing generative retrieval at the scale of Pinterest.

PinRec: Outcome-Conditioned, Multi-Token Generative Retrieval for Industry-Scale Recommendation Systems

TL;DR

PinRec addresses the scalability and multi-objective demands of industry-scale generative retrieval by introducing outcome-conditioned generation and temporal multi-token inference within a Transformer-based framework. The approach enables targeted optimization across business metrics (e.g., saves, clicks) while producing diverse candidate sets through multi-token generation and efficient serving via CUDA graphs, KV caches, and ANN retrieval. Offline results show substantial gains over baselines, and online A/B tests demonstrate controllable improvements across Homefeed, Search, and Related Pins with modest latency costs. This work constitutes a practical, first comprehensive deployment of generative retrieval at web scale, with evidence of real-world impact and avenues for future enhancement in ranking integration and input-sequence optimization.

Abstract

Generative retrieval methods utilize generative sequential modeling techniques, such as transformers, to generate candidate items for recommender systems. These methods have demonstrated promising results in academic benchmarks, surpassing traditional retrieval models like two-tower architectures. However, current generative retrieval methods lack the scalability required for industrial recommender systems, and they are insufficiently flexible to satisfy the multiple metric requirements of modern systems. This paper introduces PinRec, a novel generative retrieval model developed for applications at Pinterest. PinRec utilizes outcome-conditioned generation, enabling modelers to specify how to balance various outcome metrics, such as the number of saves and clicks, to effectively align with business goals and user exploration. Additionally, PinRec incorporates multi-token generation to enhance output diversity while optimizing generation. Our experiments demonstrate that PinRec can successfully balance performance, diversity, and efficiency, delivering a significant positive impact to users using generative models. This paper marks a significant milestone in generative retrieval, as it presents, to our knowledge, the first rigorous study on implementing generative retrieval at the scale of Pinterest.

Paper Structure

This paper contains 41 sections, 8 equations, 10 figures, 10 tables.

Figures (10)

  • Figure 1: Illustration of PinRec, a generative item retrieval technique for heterogeneous user journeys on Pinterest. Sequences of user searches, engagements, and outcome-conditioning (bottom) are used to recommend Pins (top).
  • Figure 2: Modeling diagram of PinRec
  • Figure 3: Serving flow for the PinRec system. Green boxes represent services, blue rectangles represent transmitted data, purple cylinders represent indexed data, and beige boxes are steps within the PinRec NVIDIA Triton ensemble.
  • Figure 4: Item recommendations from PinRec-OC for a user history with cat-related search queries and pin interactions (history shown in reverse chronological order). Additional examples are provided in \ref{['appendix:multitoken_vis']}.
  • Figure 5: Percentage lift in unordered recall for PinRec-OC over PinRec-UC when conditioning on the desired action, stratified by the actual action taken by the user.
  • ...and 5 more figures