Improving Ad matching via Cluster-Adaptive Keyword Expansion and Relevance tuning
Dipanwita Saha, Anis Zaman, Hua Zou, Ning Chen, Xinxin Shu, Nadia Vase, Abraham Bagherjeiran
TL;DR
The paper tackles the mismatch between token-based ad matching and semantic query variation in e-commerce. It proposes a document-side semantic keyword expansion pipeline using a pre-trained siamese encoder to produce dense keyword embeddings, followed by FAISS-based nearest-neighbor retrieval and cluster-specific thresholds to balance precision and recall. To maintain relevance for expanded keywords, it introduces an incremental relevance refinement using a lightweight stacked ensemble of decision trees on top of a baseline gradient-boosted model, with market-specific tuning. Online and offline evaluations show improved coverage, CTR, and revenue, validating a scalable, low-latency approach that adapts to evolving query behavior and catalog inventory.
Abstract
In search advertising, keyword matching connects user queries with relevant ads. While token-based matching increases ad coverage, it can reduce relevance due to overly permissive semantic expansion. This work extends keyword reach through document-side semantic keyword expansion, using a language model to broaden token-level matching without altering queries. We propose a solution using a pre-trained siamese model to generate dense vector representations of ad keywords and identify semantically related variants through nearest neighbor search. To maintain precision, we introduce a cluster-based thresholding mechanism that adjusts similarity cutoffs based on local semantic density. Each expanded keyword maps to a group of seller-listed items, which may only partially align with the original intent. To ensure relevance, we enhance the downstream relevance model by adapting it to the expanded keyword space using an incremental learning strategy with a lightweight decision tree ensemble. This system improves both relevance and click-through rate (CTR), offering a scalable, low-latency solution adaptable to evolving query behavior and advertising inventory.
