Semantic In-Domain Product Identification for Search Queries

Sanat Sharma; Jayant Kumar; Twisha Naik; Zhaoyu Lu; Arvind Srikantan; Tracy Holloway King

Semantic In-Domain Product Identification for Search Queries

Sanat Sharma, Jayant Kumar, Twisha Naik, Zhaoyu Lu, Arvind Srikantan, Tracy Holloway King

TL;DR

The paper tackles explicit and implicit product identification for Adobe's extensive product catalog by training a domain-specific, language-model–based classifier on user behavioral data. It pretrains a DeBERTa-based backbone on internal HelpX documents and then fine-tunes a two-layer multi-label classifier with weighted losses, achieving a robust understanding of Adobe product vocabulary and disambiguation. Key results include a test F1 of $F_1=0.949$ and an offline perplexity of $7.47$ for the backbone, plus AB-test gains of $>\!25\%$ CTR, $>\!50\%$ null-rate reduction, and a 2x increase in app-card surfacing, leading to improved product visibility. The work demonstrates the practical impact of domain-specific LM pretraining plus a targeted classifier for driving more accurate and comprehensive app-card recommendations in search and autocomplete, with planned multilingual expansion and improved long-prompt handling for retrieval-augmented systems.

Abstract

Accurate explicit and implicit product identification in search queries is critical for enhancing user experiences, especially at a company like Adobe which has over 50 products and covers queries across hundreds of tools. In this work, we present a novel approach to training a product classifier from user behavioral data. Our semantic model led to >25% relative improvement in CTR (click through rate) across the deployed surfaces; a >50% decrease in null rate; a 2x increase in the app cards surfaced, which helps drive product visibility.

Semantic In-Domain Product Identification for Search Queries

TL;DR

and an offline perplexity of

for the backbone, plus AB-test gains of

CTR,

null-rate reduction, and a 2x increase in app-card surfacing, leading to improved product visibility. The work demonstrates the practical impact of domain-specific LM pretraining plus a targeted classifier for driving more accurate and comprehensive app-card recommendations in search and autocomplete, with planned multilingual expansion and improved long-prompt handling for retrieval-augmented systems.

Abstract

Paper Structure (11 sections, 1 equation, 2 figures, 5 tables)

This paper contains 11 sections, 1 equation, 2 figures, 5 tables.

Introduction
Prior Art
Datasets
Model
Language Model Pretraining
Classifier Training
Offline Evaluation and AB Testing
Quantitative Evaluation on Behavioral Queries
Qualitative Manual Annotation of Implicit Intent
AB Testing
Conclusion and Future Work

Figures (2)

Figure 1: Product App Card Experiences: Top: App cards at the top of search results for ai generative fill. Bottom: Autocomplete for ai genera with textual query suggestions are shown below the app cards. The product intent is implicit.
Figure 2: DeBERTa Pretraining: We break HelpX documents into blocks of 128 tokens and pretrain. This allows the LM to understand Adobe product vocabulary and features better.

Semantic In-Domain Product Identification for Search Queries

TL;DR

Abstract

Semantic In-Domain Product Identification for Search Queries

Authors

TL;DR

Abstract

Table of Contents

Figures (2)