Semantic Commit: Helping Users Update Intent Specifications for AI Memory at Scale
Priyan Vaithilingam, Munyeong Kim, Frida-Cecilia Acosta-Parenteau, Daniel Lee, Amine Mhedhbi, Elena L. Glassman, Ian Arawjo
TL;DR
SemanticCommit tackles updating AI memory of user intent at scale by introducing a semantic commit workflow that detects and resolves semantic conflicts during memory updates. The system combines a knowledge-graph–driven RAG pipeline with LLM-based resolution, implemented in a React/TypeScript frontend and a Flask backend, and evaluated through benchmarks and a within-subjects user study against ChatGPT Canvas. Key contributions include a detailed design goal framework, an end-to-end architecture separating retrieval from generation, four domain benchmarks, and empirical evidence that impact analysis and granular, human-in-the-loop edits improve conflict detection and user sense of control. The work offers design implications for AI-agent memory interfaces, advocating proactive impact analysis, adjustable autonomy, and scalable memory-management APIs to support robust, user-aligned memory updates in real-world workflows.
Abstract
How do we update AI memory of user intent as intent changes? We consider how an AI interface may assist the integration of new information into a repository of natural language data. Inspired by software engineering concepts like impact analysis, we develop methods and a UI for managing semantic changes with non-local effects, which we call "semantic conflict resolution." The user commits new intent to a project -- makes a "semantic commit" -- and the AI helps the user detect and resolve semantic conflicts within a store of existing information representing their intent (an "intent specification"). We develop an interface, SemanticCommit, to better understand how users resolve conflicts when updating intent specifications such as Cursor Rules and game design documents. A knowledge graph-based RAG pipeline drives conflict detection, while LLMs assist in suggesting resolutions. We evaluate our technique on an initial benchmark. Then, we report a 12 user within-subjects study of SemanticCommit for two task domains -- game design documents, and AI agent memory in the style of ChatGPT memories -- where users integrated new information into an existing list. Half of our participants adopted a workflow of impact analysis, where they would first flag conflicts without AI revisions then resolve conflicts locally, despite having access to a global revision feature. We argue that AI agent interfaces, such as software IDEs like Cursor and Windsurf, should provide affordances for impact analysis and help users validate AI retrieval independently from generation. Our work speaks to how AI agent designers should think about updating memory as a process that involves human feedback and decision-making.
