Table of Contents
Fetching ...

MathReader : Text-to-Speech for Mathematical Documents

Sieun Hyeon, Kyudan Jung, Nam-Joon Kim, Hyun Gon Ryu, Jaeyoung Do

TL;DR

MathReader tackles the gap where existing TTS readers misread or skip mathematical formulas in LaTeX-based documents. It integrates Nougat-small OCR, a fine-tuned T5-small translator to convert LaTeX formulas into spoken English, and a VITS-based TTS to produce accurate speech content. In evaluations, MathReader achieves substantially lower Word Error Rate $WER$ and Character Error Rate $CER$ than Microsoft Edge and Adobe Acrobat, with $WER$ as low as $0.281$ and $CER$ as $0.148$ on test data, demonstrating improved accessibility for visually impaired users. The approach enables near real-time, formula-aware document reading and is released with public code for broader adoption.

Abstract

TTS (Text-to-Speech) document reader from Microsoft, Adobe, Apple, and OpenAI have been serviced worldwide. They provide relatively good TTS results for general plain text, but sometimes skip contents or provide unsatisfactory results for mathematical expressions. This is because most modern academic papers are written in LaTeX, and when LaTeX formulas are compiled, they are rendered as distinctive text forms within the document. However, traditional TTS document readers output only the text as it is recognized, without considering the mathematical meaning of the formulas. To address this issue, we propose MathReader, which effectively integrates OCR, a fine-tuned T5 model, and TTS. MathReader demonstrated a lower Word Error Rate (WER) than existing TTS document readers, such as Microsoft Edge and Adobe Acrobat, when processing documents containing mathematical formulas. MathReader reduced the WER from 0.510 to 0.281 compared to Microsoft Edge, and from 0.617 to 0.281 compared to Adobe Acrobat. This will significantly contribute to alleviating the inconvenience faced by users who want to listen to documents, especially those who are visually impaired. The code is available at https://github.com/hyeonsieun/MathReader.

MathReader : Text-to-Speech for Mathematical Documents

TL;DR

MathReader tackles the gap where existing TTS readers misread or skip mathematical formulas in LaTeX-based documents. It integrates Nougat-small OCR, a fine-tuned T5-small translator to convert LaTeX formulas into spoken English, and a VITS-based TTS to produce accurate speech content. In evaluations, MathReader achieves substantially lower Word Error Rate and Character Error Rate than Microsoft Edge and Adobe Acrobat, with as low as and as on test data, demonstrating improved accessibility for visually impaired users. The approach enables near real-time, formula-aware document reading and is released with public code for broader adoption.

Abstract

TTS (Text-to-Speech) document reader from Microsoft, Adobe, Apple, and OpenAI have been serviced worldwide. They provide relatively good TTS results for general plain text, but sometimes skip contents or provide unsatisfactory results for mathematical expressions. This is because most modern academic papers are written in LaTeX, and when LaTeX formulas are compiled, they are rendered as distinctive text forms within the document. However, traditional TTS document readers output only the text as it is recognized, without considering the mathematical meaning of the formulas. To address this issue, we propose MathReader, which effectively integrates OCR, a fine-tuned T5 model, and TTS. MathReader demonstrated a lower Word Error Rate (WER) than existing TTS document readers, such as Microsoft Edge and Adobe Acrobat, when processing documents containing mathematical formulas. MathReader reduced the WER from 0.510 to 0.281 compared to Microsoft Edge, and from 0.617 to 0.281 compared to Adobe Acrobat. This will significantly contribute to alleviating the inconvenience faced by users who want to listen to documents, especially those who are visually impaired. The code is available at https://github.com/hyeonsieun/MathReader.
Paper Structure (15 sections, 2 figures, 4 tables, 1 algorithm)

This paper contains 15 sections, 2 figures, 4 tables, 1 algorithm.

Figures (2)

  • Figure 1: Our pipeline for reading the document correctly.
  • Figure 2: This figure shows an example of a TTS document reader skipping a formula from a document. (1) is a part of the document we used for testing. Because the document is old and of low quality, Microsoft Edge skips reading it aloud (See (2)). However, MathReader reads this correctly (See (3)).