Table of Contents
Fetching ...

MOoSE: Multi-Orientation Sharing Experts for Open-set Scene Text Recognition

Chang Liu, Simon Corbillé, Elisa H Barney Smith

TL;DR

This work tackles open-set scene text recognition across multiple writing directions by introducing the MOOSTR benchmark and the MOoSE framework. MOOSTR expands open-set recognition to include vertical text and orientation variety, while MOoSE uses a Dispatcher to route samples by aspect ratio and two expert groups (Sample and Side-information) to share knowledge and generate prototypes for novel characters. An open-set classifier operates on feature-prototype space, with training that jointly optimizes recognition and length-prediction losses and enables incremental prototype augmentation. Empirical results from ablations and MOOSTR splits show that preserving orientation information and selectively sharing backbone components yield robust recognition and rejection across seen and unseen characters, establishing a strong baseline for future multi-orientation open-set text recognition research.

Abstract

Open-set text recognition, which aims to address both novel characters and previously seen ones, is one of the rising subtopics in the text recognition field. However, the current open-set text recognition solutions only focuses on horizontal text, which fail to model the real-life challenges posed by the variety of writing directions in real-world scene text. Multi-orientation text recognition, in general, faces challenges from the diverse image aspect ratios, significant imbalance in data amount, and domain gaps between orientations. In this work, we first propose a Multi-Oriented Open-Set Text Recognition task (MOOSTR) to model the challenges of both novel characters and writing direction variety. We then propose a Multi-Orientation Sharing Experts (MOoSE) framework as a strong baseline solution. MOoSE uses a mixture-of-experts scheme to alleviate the domain gaps between orientations, while exploiting common structural knowledge among experts to alleviate the data scarcity that some experts face. The proposed MOoSE framework is validated by ablative experiments, and also tested for feasibility on the existing open-set benchmark. Code, models, and documents are available at: https://github.com/lancercat/Moose/

MOoSE: Multi-Orientation Sharing Experts for Open-set Scene Text Recognition

TL;DR

This work tackles open-set scene text recognition across multiple writing directions by introducing the MOOSTR benchmark and the MOoSE framework. MOOSTR expands open-set recognition to include vertical text and orientation variety, while MOoSE uses a Dispatcher to route samples by aspect ratio and two expert groups (Sample and Side-information) to share knowledge and generate prototypes for novel characters. An open-set classifier operates on feature-prototype space, with training that jointly optimizes recognition and length-prediction losses and enables incremental prototype augmentation. Empirical results from ablations and MOOSTR splits show that preserving orientation information and selectively sharing backbone components yield robust recognition and rejection across seen and unseen characters, establishing a strong baseline for future multi-orientation open-set text recognition research.

Abstract

Open-set text recognition, which aims to address both novel characters and previously seen ones, is one of the rising subtopics in the text recognition field. However, the current open-set text recognition solutions only focuses on horizontal text, which fail to model the real-life challenges posed by the variety of writing directions in real-world scene text. Multi-orientation text recognition, in general, faces challenges from the diverse image aspect ratios, significant imbalance in data amount, and domain gaps between orientations. In this work, we first propose a Multi-Oriented Open-Set Text Recognition task (MOOSTR) to model the challenges of both novel characters and writing direction variety. We then propose a Multi-Orientation Sharing Experts (MOoSE) framework as a strong baseline solution. MOoSE uses a mixture-of-experts scheme to alleviate the domain gaps between orientations, while exploiting common structural knowledge among experts to alleviate the data scarcity that some experts face. The proposed MOoSE framework is validated by ablative experiments, and also tested for feasibility on the existing open-set benchmark. Code, models, and documents are available at: https://github.com/lancercat/Moose/
Paper Structure (20 sections, 13 equations, 10 figures, 6 tables)

This paper contains 20 sections, 13 equations, 10 figures, 6 tables.

Figures (10)

  • Figure 1: Samples of multi-orientated text instances in the wild from various sources, showcasing the multi-orientation recognition capability of the proposed Multi-Orientation Sharing Experts framework. The model is only trained with English and Chinese data. The Japanese sample comes from the annotated MLT dataset and recognition, and thus results are shown in colors. Green and red characters indicating correctly and wrongly predicted characters. The rest of the images come from the internet, so the results are shown in white, and a blue block indicates rejection.
  • Figure 2: Examples of characters that are ambiguous when orientation information is not available ('c', 'n', and 'u' in this case).
  • Figure 3: The work flow, label conversions, splits conversions of the MOOSTR task.
  • Figure 4: The proposed Multi-Orientation Sharing Experts (MOoSE) framework. Images are dispatched to the appropriate expert based on their orientation determined from the aspect ratio. In additional, a character expert is used to generalize class-centers for the open-set classifier.
  • Figure 5: The proposed sample dispatcher (The yellow block in Fig \ref{['fig:framework']}). The module routes the input images to different experts for feature extraction according to the aspect ratio of the image.
  • ...and 5 more figures