Table of Contents
Fetching ...

Positional Description for Numerical Normalization

Deepanshu Gupta, Javier Latorre

TL;DR

This paper addresses numerical text normalization (TN) in NLP pipelines, where subword tokenization causes place-value errors and fatal misnormalizations. It introduces the Positional Description Scheme (PDS), a lightweight preprocessing step that encodes each digit with facevalue and placevalue cues, transforming TN into a simple input-output mapping that requires far less data and no architectural changes. Across English, Polish, and Russian, PDS-based models trained on varying data sizes show substantial reductions in fatal errors and relative accuracy improvements of $23\%$ to $51\%$ on complex arithmetic tasks, while maintaining production-friendly latency. The work demonstrates PDS's effectiveness for TTS/ASR TN, supports multilingual deployment, and highlights significant gains in data efficiency and reliability for numerical normalization without rule-based FSTs.

Abstract

We present a Positional Description Scheme (PDS) tailored for digit sequences, integrating placeholder value information for each digit. Given the structural limitations of subword tokenization algorithms, language models encounter critical Text Normalization (TN) challenges when handling numerical tasks. Our schema addresses this challenge through straightforward pre-processing, preserving the model architecture while significantly simplifying number normalization, rendering the problem tractable. This simplifies the task and facilitates more compact production-ready models capable of learning from smaller datasets. Furthermore, our investigations reveal that PDS enhances the arithmetic processing capabilities of language models, resulting in a relative accuracy improvement of 23% to 51% on complex arithmetic tasks. We demonstrate that PDS effectively mitigates fatal numerical normalization errors in neural models, requiring only a modest amount of training data without rule-based Finite State Transducers (FST). We demonstrate that PDS is essential for both the Text-To-Speech and Speech Recognition text processing, enabling effective TN under production constraints.

Positional Description for Numerical Normalization

TL;DR

This paper addresses numerical text normalization (TN) in NLP pipelines, where subword tokenization causes place-value errors and fatal misnormalizations. It introduces the Positional Description Scheme (PDS), a lightweight preprocessing step that encodes each digit with facevalue and placevalue cues, transforming TN into a simple input-output mapping that requires far less data and no architectural changes. Across English, Polish, and Russian, PDS-based models trained on varying data sizes show substantial reductions in fatal errors and relative accuracy improvements of to on complex arithmetic tasks, while maintaining production-friendly latency. The work demonstrates PDS's effectiveness for TTS/ASR TN, supports multilingual deployment, and highlights significant gains in data efficiency and reliability for numerical normalization without rule-based FSTs.

Abstract

We present a Positional Description Scheme (PDS) tailored for digit sequences, integrating placeholder value information for each digit. Given the structural limitations of subword tokenization algorithms, language models encounter critical Text Normalization (TN) challenges when handling numerical tasks. Our schema addresses this challenge through straightforward pre-processing, preserving the model architecture while significantly simplifying number normalization, rendering the problem tractable. This simplifies the task and facilitates more compact production-ready models capable of learning from smaller datasets. Furthermore, our investigations reveal that PDS enhances the arithmetic processing capabilities of language models, resulting in a relative accuracy improvement of 23% to 51% on complex arithmetic tasks. We demonstrate that PDS effectively mitigates fatal numerical normalization errors in neural models, requiring only a modest amount of training data without rule-based Finite State Transducers (FST). We demonstrate that PDS is essential for both the Text-To-Speech and Speech Recognition text processing, enabling effective TN under production constraints.
Paper Structure (15 sections, 3 figures, 2 tables)

This paper contains 15 sections, 3 figures, 2 tables.

Figures (3)

  • Figure 1: Avg. accuracy for English for different data sizes.
  • Figure 2: Avg. accuracy for Russian for different data sizes.
  • Figure 3: Avg. accuracy for Polish for different data sizes.