Table of Contents
Fetching ...

InkFM: A Foundational Model for Full-Page Online Handwritten Note Understanding

Anastasiia Fadeeva, Vincent Coriou, Diego Antognini, Claudiu Musat, Andrii Maksai

TL;DR

This work introduces a foundational model called InkFM for analyzing full pages of handwritten content, trained on a diverse mixture of tasks, which achieves state-of-the art text recognition and sketch classification and provides a powerful starting point for developing applications with handwritten input.

Abstract

Tablets and styluses are increasingly popular for taking notes. To optimize this experience and ensure a smooth and efficient workflow, it's important to develop methods for accurately interpreting and understanding the content of handwritten digital notes. We introduce a foundational model called InkFM for analyzing full pages of handwritten content. Trained on a diverse mixture of tasks, this model offers a unique combination of capabilities: recognizing text in 28 different scripts, mathematical expressions recognition, and segmenting pages into distinct elements like text and drawings. Our results demonstrate that these tasks can be effectively unified within a single model, achieving SoTA text line segmentation out-of-the-box quality surpassing public baselines like docTR. Fine- or LoRA-tuning our base model on public datasets further improves the quality of page segmentation, achieves state-of the art text recognition (DeepWriting, CASIA, SCUT, and Mathwriting datasets) and sketch classification (QuickDraw). This adaptability of InkFM provides a powerful starting point for developing applications with handwritten input.

InkFM: A Foundational Model for Full-Page Online Handwritten Note Understanding

TL;DR

This work introduces a foundational model called InkFM for analyzing full pages of handwritten content, trained on a diverse mixture of tasks, which achieves state-of-the art text recognition and sketch classification and provides a powerful starting point for developing applications with handwritten input.

Abstract

Tablets and styluses are increasingly popular for taking notes. To optimize this experience and ensure a smooth and efficient workflow, it's important to develop methods for accurately interpreting and understanding the content of handwritten digital notes. We introduce a foundational model called InkFM for analyzing full pages of handwritten content. Trained on a diverse mixture of tasks, this model offers a unique combination of capabilities: recognizing text in 28 different scripts, mathematical expressions recognition, and segmenting pages into distinct elements like text and drawings. Our results demonstrate that these tasks can be effectively unified within a single model, achieving SoTA text line segmentation out-of-the-box quality surpassing public baselines like docTR. Fine- or LoRA-tuning our base model on public datasets further improves the quality of page segmentation, achieves state-of the art text recognition (DeepWriting, CASIA, SCUT, and Mathwriting datasets) and sketch classification (QuickDraw). This adaptability of InkFM provides a powerful starting point for developing applications with handwritten input.

Paper Structure

This paper contains 23 sections, 3 equations, 8 figures, 10 tables.

Figures (8)

  • Figure 1: Examples of notes in Japanese (left) and English (right).
  • Figure 2: Three levels of segmentation in a full-page handwritten note.
  • Figure 3: Examples of handwriting in Japanese, Arabic, and English.
  • Figure 4: Example of Bengali writing with time and distance rendering.
  • Figure 5: Left: Original distribution of different languages. Right: Adjusted distribution.
  • ...and 3 more figures