Table of Contents
Fetching ...

Retrieval Augmented Spelling Correction for E-Commerce Applications

Xuan Guo, Rohit Patki, Dante Everaert, Christopher Potts

TL;DR

Improvements in spelling correction are found utilizing the RAG framework beyond a stand-alone LLM and the value of additional finetuning of the LLM to incorporate retrieved context is demonstrated.

Abstract

The rapid introduction of new brand names into everyday language poses a unique challenge for e-commerce spelling correction services, which must distinguish genuine misspellings from novel brand names that use unconventional spelling. We seek to address this challenge via Retrieval Augmented Generation (RAG). On this approach, product names are retrieved from a catalog and incorporated into the context used by a large language model (LLM) that has been fine-tuned to do contextual spelling correction. Through quantitative evaluation and qualitative error analyses, we find improvements in spelling correction utilizing the RAG framework beyond a stand-alone LLM. We also demonstrate the value of additional finetuning of the LLM to incorporate retrieved context.

Retrieval Augmented Spelling Correction for E-Commerce Applications

TL;DR

Improvements in spelling correction are found utilizing the RAG framework beyond a stand-alone LLM and the value of additional finetuning of the LLM to incorporate retrieved context is demonstrated.

Abstract

The rapid introduction of new brand names into everyday language poses a unique challenge for e-commerce spelling correction services, which must distinguish genuine misspellings from novel brand names that use unconventional spelling. We seek to address this challenge via Retrieval Augmented Generation (RAG). On this approach, product names are retrieved from a catalog and incorporated into the context used by a large language model (LLM) that has been fine-tuned to do contextual spelling correction. Through quantitative evaluation and qualitative error analyses, we find improvements in spelling correction utilizing the RAG framework beyond a stand-alone LLM. We also demonstrate the value of additional finetuning of the LLM to incorporate retrieved context.

Paper Structure

This paper contains 19 sections, 5 tables.