Table of Contents
Fetching ...

Visual Product Graph: Bridging Visual Products And Composite Images For End-to-End Style Recommendations

Yue Li Du, Ben Alexander, Mikhail Antonenka, Rohan Mahadev, Hao-yu Wu, Dmitry Kislyuk

TL;DR

This work tackles bridging visual products and composite images to enable end-to-end style recommendations. It introduces the Visual Product Graph (VPG) with Forward-STL and Reverse-STL to connect product-level queries with context-rich ensembles and to retrieve complementary items, backed by a scalable feature storage and an enhanced object detector and unified visual embedding. Through large-scale pretraining, hard triplets, and float-valued embeddings, the approach achieves substantial gains in extremely similar retrieval and engagement, with robust offline, human, and online evaluations. Deployed in Pinterest's Ways to Style It, VPG demonstrates practical impact in fashion and home decor by delivering cohesive outfits with social proof and context-aware recommendations.

Abstract

Retrieving semantically similar but visually distinct contents has been a critical capability in visual search systems. In this work, we aim to tackle this problem with Visual Product Graph (VPG), leveraging high-performance infrastructure for storage and state-of-the-art computer vision models for image understanding. VPG is built to be an online real-time retrieval system that enables navigation from individual products to composite scenes containing those products, along with complementary recommendations. Our system not only offers contextual insights by showcasing how products can be styled in a context, but also provides recommendations for complementary products drawn from these inspirations. We discuss the essential components for building the Visual Product Graph, along with the core computer vision model improvements across object detection, foundational visual embeddings, and other visual signals. Our system achieves a 78.8% extremely similar@1 in end-to-end human relevance evaluations, and a 6% module engagement rate. The "Ways to Style It" module, powered by the Visual Product Graph technology, is deployed in production at Pinterest.

Visual Product Graph: Bridging Visual Products And Composite Images For End-to-End Style Recommendations

TL;DR

This work tackles bridging visual products and composite images to enable end-to-end style recommendations. It introduces the Visual Product Graph (VPG) with Forward-STL and Reverse-STL to connect product-level queries with context-rich ensembles and to retrieve complementary items, backed by a scalable feature storage and an enhanced object detector and unified visual embedding. Through large-scale pretraining, hard triplets, and float-valued embeddings, the approach achieves substantial gains in extremely similar retrieval and engagement, with robust offline, human, and online evaluations. Deployed in Pinterest's Ways to Style It, VPG demonstrates practical impact in fashion and home decor by delivering cohesive outfits with social proof and context-aware recommendations.

Abstract

Retrieving semantically similar but visually distinct contents has been a critical capability in visual search systems. In this work, we aim to tackle this problem with Visual Product Graph (VPG), leveraging high-performance infrastructure for storage and state-of-the-art computer vision models for image understanding. VPG is built to be an online real-time retrieval system that enables navigation from individual products to composite scenes containing those products, along with complementary recommendations. Our system not only offers contextual insights by showcasing how products can be styled in a context, but also provides recommendations for complementary products drawn from these inspirations. We discuss the essential components for building the Visual Product Graph, along with the core computer vision model improvements across object detection, foundational visual embeddings, and other visual signals. Our system achieves a 78.8% extremely similar@1 in end-to-end human relevance evaluations, and a 6% module engagement rate. The "Ways to Style It" module, powered by the Visual Product Graph technology, is deployed in production at Pinterest.

Paper Structure

This paper contains 31 sections, 9 figures, 4 tables, 1 algorithm.

Figures (9)

  • Figure 1: Pinterest's "Ways To Style It" module on Web. (1) The user is viewing the product page of the beige cardigan shown in the top left corner. (2) The module finds multiple full-outfit images containing this cardigan, to inspire the user about how to complete their outfit. (3) It displays shoppable versions of the other complementary items, allowing the user to purchase them
  • Figure 2: Leveraging online KV-Store to store objects and object embeddings. (1) A one-time backfill workflow is run offline in batches to backfill objects and their embeddings to the KV-store. (2) When new images are created on Pinterest, we use stream-processing service Flink to update the KV-store in near real-time. (3) As a fallback option, when users interact with an image whose key is not in the KV-store for some reason, we perform feature extraction online in real-time and write results to the KV-store.
  • Figure 3: Building an object index for Reverse-STL offline. We extract the entries from the object storage and reformat them to be indexed based on objects.
  • Figure 4: Overview of the Visual Product Graph system during serving.
  • Figure 5: Illustration of a hard triplet scenario: The unified visual embedding model inaccurately estimates the distance between the query and an extremely similar result (positive) as greater than the distance to a less relevant result (negative). A hard triplet is constructed to facilitate the embedding model's learning to rectify this discrepancy.
  • ...and 4 more figures