Handwritten Code Recognition for Pen-and-Paper CS Education
Md Sazzad Islam, Moussa Koulako Bala Doumbouya, Christopher D. Manning, Chris Piech
TL;DR
This paper tackles the challenge of teaching CS through handwritten code by developing and evaluating OCR-based pipelines that must respect Python indentation and avoid semantically altering corrections. It introduces two main approaches: (i) a modular OCR system augmented with absolute/relative indentation clustering and a language-model post-correction step, and (ii) a multimodal, end-to-end handwritten-code recognizer using large language models such as GPT-4V. The authors release public benchmarks—the Correct Student Dataset and the Logical Error Dataset—and demonstrate that a relative-indentation approach combined with simple prompting achieves an OCR error of $5.3\%$, significantly better than the $\sim$30\% baseline, while GPT-4V reaches $6.0\%$ OCR error with minimal logical hallucinations. These results point to practical pathways for scalable, keyboard-free CS education in resource-limited contexts and for automated grading of handwritten work, with the datasets enabling future research and broader adoption of handwriting-based programming curricula.
Abstract
Teaching Computer Science (CS) by having students write programs by hand on paper has key pedagogical advantages: It allows focused learning and requires careful thinking compared to the use of Integrated Development Environments (IDEs) with intelligent support tools or "just trying things out". The familiar environment of pens and paper also lessens the cognitive load of students with no prior experience with computers, for whom the mere basic usage of computers can be intimidating. Finally, this teaching approach opens learning opportunities to students with limited access to computers. However, a key obstacle is the current lack of teaching methods and support software for working with and running handwritten programs. Optical character recognition (OCR) of handwritten code is challenging: Minor OCR errors, perhaps due to varied handwriting styles, easily make code not run, and recognizing indentation is crucial for languages like Python but is difficult to do due to inconsistent horizontal spacing in handwriting. Our approach integrates two innovative methods. The first combines OCR with an indentation recognition module and a language model designed for post-OCR error correction without introducing hallucinations. This method, to our knowledge, surpasses all existing systems in handwritten code recognition. It reduces error from 30\% in the state of the art to 5\% with minimal hallucination of logical fixes to student programs. The second method leverages a multimodal language model to recognize handwritten programs in an end-to-end fashion. We hope this contribution can stimulate further pedagogical research and contribute to the goal of making CS education universally accessible. We release a dataset of handwritten programs and code to support future research at https://github.com/mdoumbouya/codeocr
