Magic Markup: Maintaining Document-External Markup with an LLM

Edward Misback; Zachary Tatlock; Steven L. Tanimoto

Magic Markup: Maintaining Document-External Markup with an LLM

Edward Misback, Zachary Tatlock, Steven L. Tanimoto

TL;DR

A system that employs an intelligent agent to re-tag modified programs, enabling rich annotations to automatically follow code as it evolves is contributed, achieving an accuracy of 90% on benchmarks and can replace a document’s tags in parallel at a rate of 5 seconds per tag.

Abstract

Text documents, including programs, typically have human-readable semantic structure. Historically, programmatic access to these semantics has required explicit in-document tagging. Especially in systems where the text has an execution semantics, this means it is an opt-in feature that is hard to support properly. Today, language models offer a new method: metadata can be bound to entities in changing text using a model's human-like understanding of semantics, with no requirements on the document structure. This method expands the applications of document annotation, a fundamental operation in program writing, debugging, maintenance, and presentation. We contribute a system that employs an intelligent agent to re-tag modified programs, enabling rich annotations to automatically follow code as it evolves. We also contribute a formal problem definition, an empirical synthetic benchmark suite, and our benchmark generator. Our system achieves an accuracy of 90% on our benchmarks and can replace a document's tags in parallel at a rate of 5 seconds per tag. While there remains significant room for improvement, we find performance reliable enough to justify further exploration of applications.

Magic Markup: Maintaining Document-External Markup with an LLM

TL;DR

Abstract

Paper Structure (38 sections, 2 figures)

This paper contains 38 sections, 2 figures.

Introduction
Problem statement
User story
Our contributions
Related work
Basic Definitions
Tagged Code Updates Benchmark Suite
Code generation system
Benchmark suite description
Parameters
Benchmarks filtered out of the test set
Benchmark suite generation costs
Prototype re-tagging system
Re-tagging prompt
Prompt hand-tuning
...and 23 more sections

Figures (2)

Figure 1: An example synthetic benchmark. On the left, a language model has produced an original program for displaying the price of drinks, and another model has selected and delimited a segment (the "menuItems" constant) with black Unicode star characters ($\star$). On the right, a language model has synthesized updates to the program while keeping the segment in place. Our re-tagging system predicts the position of the segment in an unmarked version of the file on the right.
Figure 2: Code that led to an ending line number error. gpt-4-turbo-0125 chose line 15 for the final line of the segment after correctly stating the full text of the segment, including the brace on line 16.

Magic Markup: Maintaining Document-External Markup with an LLM

TL;DR

Abstract

Magic Markup: Maintaining Document-External Markup with an LLM

Authors

TL;DR

Abstract

Table of Contents

Figures (2)