Table of Contents
Fetching ...

Semantic In-Domain Product Identification for Search Queries

Sanat Sharma, Jayant Kumar, Twisha Naik, Zhaoyu Lu, Arvind Srikantan, Tracy Holloway King

TL;DR

The paper tackles explicit and implicit product identification for Adobe's extensive product catalog by training a domain-specific, language-model–based classifier on user behavioral data. It pretrains a DeBERTa-based backbone on internal HelpX documents and then fine-tunes a two-layer multi-label classifier with weighted losses, achieving a robust understanding of Adobe product vocabulary and disambiguation. Key results include a test F1 of $F_1=0.949$ and an offline perplexity of $7.47$ for the backbone, plus AB-test gains of $>\!25\%$ CTR, $>\!50\%$ null-rate reduction, and a 2x increase in app-card surfacing, leading to improved product visibility. The work demonstrates the practical impact of domain-specific LM pretraining plus a targeted classifier for driving more accurate and comprehensive app-card recommendations in search and autocomplete, with planned multilingual expansion and improved long-prompt handling for retrieval-augmented systems.

Abstract

Accurate explicit and implicit product identification in search queries is critical for enhancing user experiences, especially at a company like Adobe which has over 50 products and covers queries across hundreds of tools. In this work, we present a novel approach to training a product classifier from user behavioral data. Our semantic model led to >25% relative improvement in CTR (click through rate) across the deployed surfaces; a >50% decrease in null rate; a 2x increase in the app cards surfaced, which helps drive product visibility.

Semantic In-Domain Product Identification for Search Queries

TL;DR

The paper tackles explicit and implicit product identification for Adobe's extensive product catalog by training a domain-specific, language-model–based classifier on user behavioral data. It pretrains a DeBERTa-based backbone on internal HelpX documents and then fine-tunes a two-layer multi-label classifier with weighted losses, achieving a robust understanding of Adobe product vocabulary and disambiguation. Key results include a test F1 of and an offline perplexity of for the backbone, plus AB-test gains of CTR, null-rate reduction, and a 2x increase in app-card surfacing, leading to improved product visibility. The work demonstrates the practical impact of domain-specific LM pretraining plus a targeted classifier for driving more accurate and comprehensive app-card recommendations in search and autocomplete, with planned multilingual expansion and improved long-prompt handling for retrieval-augmented systems.

Abstract

Accurate explicit and implicit product identification in search queries is critical for enhancing user experiences, especially at a company like Adobe which has over 50 products and covers queries across hundreds of tools. In this work, we present a novel approach to training a product classifier from user behavioral data. Our semantic model led to >25% relative improvement in CTR (click through rate) across the deployed surfaces; a >50% decrease in null rate; a 2x increase in the app cards surfaced, which helps drive product visibility.
Paper Structure (11 sections, 1 equation, 2 figures, 5 tables)

This paper contains 11 sections, 1 equation, 2 figures, 5 tables.

Figures (2)

  • Figure 1: Product App Card Experiences: Top: App cards at the top of search results for ai generative fill. Bottom: Autocomplete for ai genera with textual query suggestions are shown below the app cards. The product intent is implicit.
  • Figure 2: DeBERTa Pretraining: We break HelpX documents into blocks of 128 tokens and pretrain. This allows the LM to understand Adobe product vocabulary and features better.