Table of Contents
Fetching ...

Handwritten Code Recognition for Pen-and-Paper CS Education

Md Sazzad Islam, Moussa Koulako Bala Doumbouya, Christopher D. Manning, Chris Piech

TL;DR

This paper tackles the challenge of teaching CS through handwritten code by developing and evaluating OCR-based pipelines that must respect Python indentation and avoid semantically altering corrections. It introduces two main approaches: (i) a modular OCR system augmented with absolute/relative indentation clustering and a language-model post-correction step, and (ii) a multimodal, end-to-end handwritten-code recognizer using large language models such as GPT-4V. The authors release public benchmarks—the Correct Student Dataset and the Logical Error Dataset—and demonstrate that a relative-indentation approach combined with simple prompting achieves an OCR error of $5.3\%$, significantly better than the $\sim$30\% baseline, while GPT-4V reaches $6.0\%$ OCR error with minimal logical hallucinations. These results point to practical pathways for scalable, keyboard-free CS education in resource-limited contexts and for automated grading of handwritten work, with the datasets enabling future research and broader adoption of handwriting-based programming curricula.

Abstract

Teaching Computer Science (CS) by having students write programs by hand on paper has key pedagogical advantages: It allows focused learning and requires careful thinking compared to the use of Integrated Development Environments (IDEs) with intelligent support tools or "just trying things out". The familiar environment of pens and paper also lessens the cognitive load of students with no prior experience with computers, for whom the mere basic usage of computers can be intimidating. Finally, this teaching approach opens learning opportunities to students with limited access to computers. However, a key obstacle is the current lack of teaching methods and support software for working with and running handwritten programs. Optical character recognition (OCR) of handwritten code is challenging: Minor OCR errors, perhaps due to varied handwriting styles, easily make code not run, and recognizing indentation is crucial for languages like Python but is difficult to do due to inconsistent horizontal spacing in handwriting. Our approach integrates two innovative methods. The first combines OCR with an indentation recognition module and a language model designed for post-OCR error correction without introducing hallucinations. This method, to our knowledge, surpasses all existing systems in handwritten code recognition. It reduces error from 30\% in the state of the art to 5\% with minimal hallucination of logical fixes to student programs. The second method leverages a multimodal language model to recognize handwritten programs in an end-to-end fashion. We hope this contribution can stimulate further pedagogical research and contribute to the goal of making CS education universally accessible. We release a dataset of handwritten programs and code to support future research at https://github.com/mdoumbouya/codeocr

Handwritten Code Recognition for Pen-and-Paper CS Education

TL;DR

This paper tackles the challenge of teaching CS through handwritten code by developing and evaluating OCR-based pipelines that must respect Python indentation and avoid semantically altering corrections. It introduces two main approaches: (i) a modular OCR system augmented with absolute/relative indentation clustering and a language-model post-correction step, and (ii) a multimodal, end-to-end handwritten-code recognizer using large language models such as GPT-4V. The authors release public benchmarks—the Correct Student Dataset and the Logical Error Dataset—and demonstrate that a relative-indentation approach combined with simple prompting achieves an OCR error of , significantly better than the 30\% baseline, while GPT-4V reaches OCR error with minimal logical hallucinations. These results point to practical pathways for scalable, keyboard-free CS education in resource-limited contexts and for automated grading of handwritten work, with the datasets enabling future research and broader adoption of handwriting-based programming curricula.

Abstract

Teaching Computer Science (CS) by having students write programs by hand on paper has key pedagogical advantages: It allows focused learning and requires careful thinking compared to the use of Integrated Development Environments (IDEs) with intelligent support tools or "just trying things out". The familiar environment of pens and paper also lessens the cognitive load of students with no prior experience with computers, for whom the mere basic usage of computers can be intimidating. Finally, this teaching approach opens learning opportunities to students with limited access to computers. However, a key obstacle is the current lack of teaching methods and support software for working with and running handwritten programs. Optical character recognition (OCR) of handwritten code is challenging: Minor OCR errors, perhaps due to varied handwriting styles, easily make code not run, and recognizing indentation is crucial for languages like Python but is difficult to do due to inconsistent horizontal spacing in handwriting. Our approach integrates two innovative methods. The first combines OCR with an indentation recognition module and a language model designed for post-OCR error correction without introducing hallucinations. This method, to our knowledge, surpasses all existing systems in handwritten code recognition. It reduces error from 30\% in the state of the art to 5\% with minimal hallucination of logical fixes to student programs. The second method leverages a multimodal language model to recognize handwritten programs in an end-to-end fashion. We hope this contribution can stimulate further pedagogical research and contribute to the goal of making CS education universally accessible. We release a dataset of handwritten programs and code to support future research at https://github.com/mdoumbouya/codeocr
Paper Structure (34 sections, 6 equations, 7 figures, 4 tables, 1 algorithm)

This paper contains 34 sections, 6 equations, 7 figures, 4 tables, 1 algorithm.

Figures (7)

  • Figure 1: Processing a student's handwritten program. (1) The OCR module produces bounding boxes and noisy transcriptions for each line of code. (2) High-fidelity reconstruction of the student's intended discrete indentation levels. (3) Post-correction using chain-of-thought prompting of a language model. A key challenge is to reconstruct the student's work by correcting transcription errors (e.g. missing underscores on lines 11 and 13, dash on lines 2, 8, and 9, capitalization and inserted space on line 6) without introducing artifacts (e.g. removed comment on line 8).
  • Figure 2: Left: Example program with corresponding delta values. Large positive deltas signify indentation. Right: A histogram of positive delta values among 16 images shows a clear distinction between indent and no-indent.
  • Figure 3: Chain-of-Thought Prompting Pipeline
  • Figure 4: Example of perfect transcription of a student's program in a grid world programming environment which is often used in introductory programming courses. The language model accurately fixed all OCR errors and did not introduce any artifacts, despite this type of program being less frequent in the training set of large language models.
  • Figure 5: An example of OCR (Azure + Relative + Simple) on a longer example of handwritten student code. The OCR faithfully translates the student program, even keeping logical errors such as the test for leap year. It does not keep the student's comment and it does not include the second "print" line: print(" ")
  • ...and 2 more figures