HTML papers on arXiv -- why it is important, and how we made it happen
Charles Frankston, Jonathan Godfrey, Shamsi Brinn, Alison Hofer, Mark Nazzaro
TL;DR
The paper addresses the accessibility gap in scientific publishing by advocating HTML as a reader format for arXiv and detailing an automated LaTeX-to-HTML conversion pathway. The approach centers on evaluating and integrating LaTeXML via the ar5iv project to render HTML from LaTeX, enabling responsive, semantically rich documents that are friendlier to assistive technologies and search engines. Early deployment in December 2023 yielded positive user feedback, particularly from screen-reader and Braille users, while acknowledging ongoing issues and the need for ongoing improvements. The work has practical impact by advancing open science through more accessible, mobile-friendly, and indexable papers, and it outlines concrete plans for iterative enhancements, scaling, and better integration with authors and tooling.
Abstract
In October 2023, arXiv made HTML formatted papers available to readers. This was the exciting outcome of over a year of accessibility research and development with the scientific community. Currently, only 2.4% of research outputs meet accessibility guidelines. Informed by scientists who rely on assistive technology, our analysis demonstrates that offering HTML is the most impactful step arXiv can take. Scientists need HTML now, and emphasize to not let perfect be the enemy of good enough. In this paper we share with you how arXiv is achieving HTML conversions from LaTeX now, and our plans for future improvements.
