Creating an AI Observer: Generative Semantic Workspaces
Pavan Holur, Shreyas Rajesh, David Chong, Vwani Roychowdhury
TL;DR
This work addresses the lack of AI systems that can act as Observer-like semantic agents by introducing the Generative Semantic Workspace (GSW), a two-component framework with an Operator that generates actor-centric semantic Workspace instances and a Reconciler that merges them into a dynamic Working Memory. The Operator relies on a CRF-based representation learned with a multi-task LLaMA+LoRA setup, producing $\mathcal{W}_n$ from text segments, while the Reconciler performs pairwise node/edge comparisons to yield an updated $\mathcal{M}_{n+1}^*$, enabling autoregressive, coherent semantics across a stream of text. The authors evaluate GSW against established baselines across multiple situations (crime, firefighting, technology, healthcare, economy) using human judgments and show superior performance for both Operator and Reconciler, including strong generalization across unseen contexts. They discuss practical deployment considerations, data sourcing from GDELT, silver-standard annotations via GPT-4, and potential applications in Spatial Computing and AR, while noting limitations in coreference, data variety, and ethical considerations. Overall, GSW demonstrates a viable path toward AI observers capable of constructing and updating plot-like semantic representations that encode actor roles, states, interactions, and future questions.
Abstract
An experienced human Observer reading a document -- such as a crime report -- creates a succinct plot-like $\textit{``Working Memory''}$ comprising different actors, their prototypical roles and states at any point, their evolution over time based on their interactions, and even a map of missing Semantic parts anticipating them in the future. $\textit{An equivalent AI Observer currently does not exist}$. We introduce the $\textbf{[G]}$enerative $\textbf{[S]}$emantic $\textbf{[W]}$orkspace (GSW) -- comprising an $\textit{``Operator''}$ and a $\textit{``Reconciler''}$ -- that leverages advancements in LLMs to create a generative-style Semantic framework, as opposed to a traditionally predefined set of lexicon labels. Given a text segment $C_n$ that describes an ongoing situation, the $\textit{Operator}$ instantiates actor-centric Semantic maps (termed ``Workspace instance'' $\mathcal{W}_n$). The $\textit{Reconciler}$ resolves differences between $\mathcal{W}_n$ and a ``Working memory'' $\mathcal{M}_n^*$ to generate the updated $\mathcal{M}_{n+1}^*$. GSW outperforms well-known baselines on several tasks ($\sim 94\%$ vs. FST, GLEN, BertSRL - multi-sentence Semantics extraction, $\sim 15\%$ vs. NLI-BERT, $\sim 35\%$ vs. QA). By mirroring the real Observer, GSW provides the first step towards Spatial Computing assistants capable of understanding individual intentions and predicting future behavior.
