Table of Contents
Fetching ...

Creating an AI Observer: Generative Semantic Workspaces

Pavan Holur, Shreyas Rajesh, David Chong, Vwani Roychowdhury

TL;DR

This work addresses the lack of AI systems that can act as Observer-like semantic agents by introducing the Generative Semantic Workspace (GSW), a two-component framework with an Operator that generates actor-centric semantic Workspace instances and a Reconciler that merges them into a dynamic Working Memory. The Operator relies on a CRF-based representation learned with a multi-task LLaMA+LoRA setup, producing $\mathcal{W}_n$ from text segments, while the Reconciler performs pairwise node/edge comparisons to yield an updated $\mathcal{M}_{n+1}^*$, enabling autoregressive, coherent semantics across a stream of text. The authors evaluate GSW against established baselines across multiple situations (crime, firefighting, technology, healthcare, economy) using human judgments and show superior performance for both Operator and Reconciler, including strong generalization across unseen contexts. They discuss practical deployment considerations, data sourcing from GDELT, silver-standard annotations via GPT-4, and potential applications in Spatial Computing and AR, while noting limitations in coreference, data variety, and ethical considerations. Overall, GSW demonstrates a viable path toward AI observers capable of constructing and updating plot-like semantic representations that encode actor roles, states, interactions, and future questions.

Abstract

An experienced human Observer reading a document -- such as a crime report -- creates a succinct plot-like $\textit{``Working Memory''}$ comprising different actors, their prototypical roles and states at any point, their evolution over time based on their interactions, and even a map of missing Semantic parts anticipating them in the future. $\textit{An equivalent AI Observer currently does not exist}$. We introduce the $\textbf{[G]}$enerative $\textbf{[S]}$emantic $\textbf{[W]}$orkspace (GSW) -- comprising an $\textit{``Operator''}$ and a $\textit{``Reconciler''}$ -- that leverages advancements in LLMs to create a generative-style Semantic framework, as opposed to a traditionally predefined set of lexicon labels. Given a text segment $C_n$ that describes an ongoing situation, the $\textit{Operator}$ instantiates actor-centric Semantic maps (termed ``Workspace instance'' $\mathcal{W}_n$). The $\textit{Reconciler}$ resolves differences between $\mathcal{W}_n$ and a ``Working memory'' $\mathcal{M}_n^*$ to generate the updated $\mathcal{M}_{n+1}^*$. GSW outperforms well-known baselines on several tasks ($\sim 94\%$ vs. FST, GLEN, BertSRL - multi-sentence Semantics extraction, $\sim 15\%$ vs. NLI-BERT, $\sim 35\%$ vs. QA). By mirroring the real Observer, GSW provides the first step towards Spatial Computing assistants capable of understanding individual intentions and predicting future behavior.

Creating an AI Observer: Generative Semantic Workspaces

TL;DR

This work addresses the lack of AI systems that can act as Observer-like semantic agents by introducing the Generative Semantic Workspace (GSW), a two-component framework with an Operator that generates actor-centric semantic Workspace instances and a Reconciler that merges them into a dynamic Working Memory. The Operator relies on a CRF-based representation learned with a multi-task LLaMA+LoRA setup, producing from text segments, while the Reconciler performs pairwise node/edge comparisons to yield an updated , enabling autoregressive, coherent semantics across a stream of text. The authors evaluate GSW against established baselines across multiple situations (crime, firefighting, technology, healthcare, economy) using human judgments and show superior performance for both Operator and Reconciler, including strong generalization across unseen contexts. They discuss practical deployment considerations, data sourcing from GDELT, silver-standard annotations via GPT-4, and potential applications in Spatial Computing and AR, while noting limitations in coreference, data variety, and ethical considerations. Overall, GSW demonstrates a viable path toward AI observers capable of constructing and updating plot-like semantic representations that encode actor roles, states, interactions, and future questions.

Abstract

An experienced human Observer reading a document -- such as a crime report -- creates a succinct plot-like comprising different actors, their prototypical roles and states at any point, their evolution over time based on their interactions, and even a map of missing Semantic parts anticipating them in the future. . We introduce the enerative emantic orkspace (GSW) -- comprising an and a -- that leverages advancements in LLMs to create a generative-style Semantic framework, as opposed to a traditionally predefined set of lexicon labels. Given a text segment that describes an ongoing situation, the instantiates actor-centric Semantic maps (termed ``Workspace instance'' ). The resolves differences between and a ``Working memory'' to generate the updated . GSW outperforms well-known baselines on several tasks ( vs. FST, GLEN, BertSRL - multi-sentence Semantics extraction, vs. NLI-BERT, vs. QA). By mirroring the real Observer, GSW provides the first step towards Spatial Computing assistants capable of understanding individual intentions and predicting future behavior.
Paper Structure (31 sections, 11 equations, 28 figures, 11 tables)

This paper contains 31 sections, 11 equations, 28 figures, 11 tables.

Figures (28)

  • Figure 1: The GSW framework: The Observer constructs an internal map of the Semantics common across several instances of a situation, and in the process, identifies the recurring and prototypical conceptual gestalts. Any unfolding instance of a situation is processed through such semantic "lenses" and embedded in natural language with the use of grammar: a sample from the Observer's semantic map, with encodings of instance-specific actors, their interactions, and their evolution over space and time. In the GSW model, the goal is to construct an AI equivalent that interprets situations as they are encoded in text, to create a succinct "Workspace" instance -- or working memory -- that contains an extendable layout of the Semantics (see Tab. \ref{['tab:operator-framework']} for comparison to baselines). The Workspace is modeled as a Conditional Random Field (CRF), with Actors, Roles, States, Questions, and Predicates sampled from a conditional distribution. The CRF is estimated using a multi-task LLaMA + LoRA setup (Operator). The Reconciler computes the similarity between a pair of "Workspace instances" sampled from the CRF (see Tab. \ref{['tab:all-reconciler']} for a comparison to baselines).
  • Figure 2: Operator + Reconciler - Dynamics of the Generative Semantic Workspace [GSW]: Snapshots of the evolving Workspace instance are depicted (from top-left, clockwise) along different time points, as the GSW framework processes a story $[C_1,\dots,C_N]$ (see Fig. \ref{['fig:cj0']} for full story). We denote the lexicon types - Actor: grey, Role: blue, State: green, Predicate: edge, and Question: red - the first instance of each is in bold in the first pane. The workspace instances $\mathcal{W}_n$ (for each $C_n$) are aggregated into a consensus Workspace Instance $\mathcal{M}_{n+1}^*$ (see Fig. \ref{['fig:overview']}) using the Reconciler to compare the consensus Semantics at the previous step $\mathcal{M}_{n}^*$ with the latest Operator output $\mathcal{W}_n$. More workspace instances are presented in App. Sec. \ref{['app:sec:examples']}.
  • Figure 3: Conditional Random Field model of the Workspace: A workspace instance is constructed by sampling the CRF given an input text segment $C_n$.
  • Figure 4: Modular parameter sets for the Workspace model: Every Operator and Reconciler model (specific to a situation) comprises $<1M$ parameters -- using PeFT and LoRA -- and relies on the same shared LLM Oracle (LLaMA-2-13B).
  • Figure 5: A prospective use-case of GSW - Targeted information retrieval of Semantics in SC/AR settings.
  • ...and 23 more figures