QueryNER: Segmentation of E-commerce Queries
Chester Palen-Michel, Lizzie Liang, Zhe Wu, Constantine Lignos
TL;DR
QueryNER tackles the problem of segmenting short e-commerce queries into meaningful spans rather than extracting narrow attribute-value pairs. It introduces a compact, broadly applicable ontology of 17 entity types and a manually annotated dataset released for public use, enabling systematic study of query segmentation beyond traditional NER. Baseline experiments with BERT and XLM-R reveal the task's intrinsic difficulty on short, noisy queries, while token- and entity-drop analyses illustrate how span-based recovery can aid null/low recall retrieval. Data augmentation demonstrates that simple transformations improve robustness to noise, albeit with some trade-offs on clean data, highlighting practical implications for improving e-commerce search relevance. The work provides a valuable resource and practical insights for robust query understanding in real-world retrieval systems.
Abstract
We present QueryNER, a manually-annotated dataset and accompanying model for e-commerce query segmentation. Prior work in sequence labeling for e-commerce has largely addressed aspect-value extraction which focuses on extracting portions of a product title or query for narrowly defined aspects. Our work instead focuses on the goal of dividing a query into meaningful chunks with broadly applicable types. We report baseline tagging results and conduct experiments comparing token and entity dropping for null and low recall query recovery. Challenging test sets are created using automatic transformations and show how simple data augmentation techniques can make the models more robust to noise. We make the QueryNER dataset publicly available.
