Table of Contents
Fetching ...

Contextual Multilingual Spellchecker for User Queries

Sanat Sharma, Josep Valls-Vargas, Tracy Holloway King, Francois Guerin, Chirag Arora

TL;DR

Contextual Multilingual Spellchecker for User Queries tackles the need for fast, accurate spell correction across multiple languages with sparse query context and enterprise vocabularies. It combines contextual signals from search results and behavior with a language-agnostic Symmetric Delete Suggester and a compact 5-layer MLP ranker, augmented by a multiword expression module and a behavioral data pipeline for updates. Training relies on a bootstrapped multilingual dataset, including artificially generated queries and public misspelling corpora, yielding significant improvements over Aspell and NeuSpell in short-query regimes and enabling production deployment in Adobe autocomplete. The approach demonstrates practical impact for scalable, low-latency spell correction in enterprise search and lays groundwork for expanding to dozens of languages across products like Adobe Express and Creative Cloud.

Abstract

Spellchecking is one of the most fundamental and widely used search features. Correcting incorrectly spelled user queries not only enhances the user experience but is expected by the user. However, most widely available spellchecking solutions are either lower accuracy than state-of-the-art solutions or too slow to be used for search use cases where latency is a key requirement. Furthermore, most innovative recent architectures focus on English and are not trained in a multilingual fashion and are trained for spell correction in longer text, which is a different paradigm from spell correction for user queries, where context is sparse (most queries are 1-2 words long). Finally, since most enterprises have unique vocabularies such as product names, off-the-shelf spelling solutions fall short of users' needs. In this work, we build a multilingual spellchecker that is extremely fast and scalable and that adapts its vocabulary and hence speller output based on a specific product's needs. Furthermore, our speller out-performs general purpose spellers by a wide margin on in-domain datasets. Our multilingual speller is used in search in Adobe products, powering autocomplete in various applications.

Contextual Multilingual Spellchecker for User Queries

TL;DR

Contextual Multilingual Spellchecker for User Queries tackles the need for fast, accurate spell correction across multiple languages with sparse query context and enterprise vocabularies. It combines contextual signals from search results and behavior with a language-agnostic Symmetric Delete Suggester and a compact 5-layer MLP ranker, augmented by a multiword expression module and a behavioral data pipeline for updates. Training relies on a bootstrapped multilingual dataset, including artificially generated queries and public misspelling corpora, yielding significant improvements over Aspell and NeuSpell in short-query regimes and enabling production deployment in Adobe autocomplete. The approach demonstrates practical impact for scalable, low-latency spell correction in enterprise search and lays groundwork for expanding to dozens of languages across products like Adobe Express and Creative Cloud.

Abstract

Spellchecking is one of the most fundamental and widely used search features. Correcting incorrectly spelled user queries not only enhances the user experience but is expected by the user. However, most widely available spellchecking solutions are either lower accuracy than state-of-the-art solutions or too slow to be used for search use cases where latency is a key requirement. Furthermore, most innovative recent architectures focus on English and are not trained in a multilingual fashion and are trained for spell correction in longer text, which is a different paradigm from spell correction for user queries, where context is sparse (most queries are 1-2 words long). Finally, since most enterprises have unique vocabularies such as product names, off-the-shelf spelling solutions fall short of users' needs. In this work, we build a multilingual spellchecker that is extremely fast and scalable and that adapts its vocabulary and hence speller output based on a specific product's needs. Furthermore, our speller out-performs general purpose spellers by a wide margin on in-domain datasets. Our multilingual speller is used in search in Adobe products, powering autocomplete in various applications.
Paper Structure (15 sections, 2 figures, 4 tables)

This paper contains 15 sections, 2 figures, 4 tables.

Figures (2)

  • Figure 1: Model architecture of the speller
  • Figure 2: Spellcheck Service Architecture. The MWE module handles task-specific multi-word expressions before the suggester and ranker are called. Behavioral pipelines keep features updated. The postprocessor enables task-specific confidence boosting.