Table of Contents
Fetching ...

Computer Vision Intelligence Test Modeling and Generation: A Case Study on Smart OCR

Jing Shu, Bing-Jiun Miu, Eugene Chang, Jerry Gao, Jun Liu

TL;DR

The paper tackles the challenge of validating AI-based OCR quality under diverse imaging conditions and document layouts. It proposes a 3D classification framework that decomposes image-based text extraction into context, input, and output dimensions, operationalized through a 3D decision table and explicit test-coverage criteria. Four OCR-focused metrics—Character Accuracy (CA), String Segment Accuracy (SSA), Ordered String Segment Accuracy (OSSA), and Text-Line Accuracy (TLA)—are defined and demonstrated via a mobile receipt OCR case study, illustrating how reading order and layout influence results. The findings on two popular OCR apps reveal both strengths and limitations of the framework, and point to future work in automation and broader applicability to other input types.

Abstract

AI-based systems possess distinctive characteristics and introduce challenges in quality evaluation at the same time. Consequently, ensuring and validating AI software quality is of critical importance. In this paper, we present an effective AI software functional testing model to address this challenge. Specifically, we first present a comprehensive literature review of previous work, covering key facets of AI software testing processes. We then introduce a 3D classification model to systematically evaluate the image-based text extraction AI function, as well as test coverage criteria and complexity. To evaluate the performance of our proposed AI software quality test, we propose four evaluation metrics to cover different aspects. Finally, based on the proposed framework and defined metrics, a mobile Optical Character Recognition (OCR) case study is presented to demonstrate the framework's effectiveness and capability in assessing AI function quality.

Computer Vision Intelligence Test Modeling and Generation: A Case Study on Smart OCR

TL;DR

The paper tackles the challenge of validating AI-based OCR quality under diverse imaging conditions and document layouts. It proposes a 3D classification framework that decomposes image-based text extraction into context, input, and output dimensions, operationalized through a 3D decision table and explicit test-coverage criteria. Four OCR-focused metrics—Character Accuracy (CA), String Segment Accuracy (SSA), Ordered String Segment Accuracy (OSSA), and Text-Line Accuracy (TLA)—are defined and demonstrated via a mobile receipt OCR case study, illustrating how reading order and layout influence results. The findings on two popular OCR apps reveal both strengths and limitations of the framework, and point to future work in automation and broader applicability to other input types.

Abstract

AI-based systems possess distinctive characteristics and introduce challenges in quality evaluation at the same time. Consequently, ensuring and validating AI software quality is of critical importance. In this paper, we present an effective AI software functional testing model to address this challenge. Specifically, we first present a comprehensive literature review of previous work, covering key facets of AI software testing processes. We then introduce a 3D classification model to systematically evaluate the image-based text extraction AI function, as well as test coverage criteria and complexity. To evaluate the performance of our proposed AI software quality test, we propose four evaluation metrics to cover different aspects. Finally, based on the proposed framework and defined metrics, a mobile Optical Character Recognition (OCR) case study is presented to demonstrate the framework's effectiveness and capability in assessing AI function quality.
Paper Structure (13 sections, 12 figures, 2 tables)

This paper contains 13 sections, 12 figures, 2 tables.

Figures (12)

  • Figure 1: A context classification tree that depicts the environmental conditions when the image is captured.
  • Figure 2: Left: four sections of a shopping receipt. Right: Store info in a receipt. (1) No logo; (2) Text logo – normal font; (3) Text logo – decorated font; (4) Logo w/o text; (5) Logo w/ normal font text; (6) Logo w/ decorated text.
  • Figure 3: An input classification tree
  • Figure 4: An output classification tree
  • Figure 5: An example of an item list on a receipt. (1) Top part: the ground truth for this item list with each string segment sequentially numbered in red; (2) Bottom part: the OCR output of the item list with the adjusted string sequence and the allocation of items per line. Errors are indicated in red (deletions (D) crossed out, substitution: (S), insertion: (I)).
  • ...and 7 more figures