Building Whitespace-Sensitive Languages Using Whitespace-Insensitive Components
Alexander Hellwig, Nico Jansen, Bernhard Rumpe
TL;DR
This paper tackles the challenge of reusing whitespace-insensitive language components to build whitespace-sensitive languages, addressing a key reuse gap in software language engineering. It introduces a preprocessing-based frontend that converts indentation semantics into explicit control tokens, allowing existing components and tooling to participate in whitespace-sensitive parsing. The approach is validated by reconstructing a Python-like language from MontiCore components; 111 Python files in GemPyde parse without errors, while 102 of 3367 Python files in Transformers fail due to missing concepts rather than whitespace handling. Overall, the work provides a general, technology-agnostic preprocessing strategy and clarifies requirements for component reuse (RQ1, RQ2), advancing practical reusability across whitespace boundaries.
Abstract
In Software Language Engineering, there is a trend towards reusability by composing modular language components. However, this reusability is severely inhibited by a gap in integrating whitespace-sensitive and whitespace-insensitive languages. There is currently no consistent procedure for seamlessly reusing such language components in both cases, such that libraries often cannot be reused, and whitespacesensitive languages are developed from scratch. This paper presents a technique for using modular, whitespaceinsensitive language modules to construct whitespace sensitive languages by pre-processing language artifacts before parsing. The approach is evaluated by reconstructing a simplified version of the programming language Python. Our solution aims to increase the reusability of existing language components to reduce development time and increase the overall quality of software languages.
