Table of Contents
Fetching ...

Panini: Continual Learning in Token Space via Structured Memory

Shreyas Rajesh, Pavan Holur, Mehmet Yigit Turali, Chenda Duan, Vwani Roychowdhury

TL;DR

Panini is presented, which realizes this by representing documents as Generative Semantic Workspaces (GSW) -- an entity- and event-aware network of question-answer (QA) pairs, sufficient for an LLM to reconstruct the experienced situations and mine latent knowledge via reasoning-grounded inference chains on the network.

Abstract

Language models are increasingly used to reason over content they were not trained on, such as new documents, evolving knowledge, and user-specific data. A common approach is retrieval-augmented generation (RAG), which stores verbatim documents externally (as chunks) and retrieves only a relevant subset at inference time for an LLM to reason over. However, this results in inefficient usage of test-time compute (LLM repeatedly reasons over the same documents); moreover, chunk retrieval can inject irrelevant context that increases unsupported generation. We propose a human-like non-parametric continual learning framework, where the base model remains fixed, and learning occurs by integrating each new experience into an external semantic memory state that accumulates and consolidates itself continually. We present Panini, which realizes this by representing documents as Generative Semantic Workspaces (GSW) -- an entity- and event-aware network of question-answer (QA) pairs, sufficient for an LLM to reconstruct the experienced situations and mine latent knowledge via reasoning-grounded inference chains on the network. Given a query, Panini only traverses the continually-updated GSW (not the verbatim documents or chunks), and retrieves the most likely inference chains. Across six QA benchmarks, Panini achieves the highest average performance, 5%-7% higher than other competitive baselines, while using 2-30x fewer answer-context tokens, supports fully open-source pipelines, and reduces unsupported answers on curated unanswerable queries. The results show that efficient and accurate structuring of experiences at write time -- as achieved by the GSW framework -- yields both efficiency and reliability gains at read time. Code is available at https://github.com/roychowdhuryresearch/gsw-memory.

Panini: Continual Learning in Token Space via Structured Memory

TL;DR

Panini is presented, which realizes this by representing documents as Generative Semantic Workspaces (GSW) -- an entity- and event-aware network of question-answer (QA) pairs, sufficient for an LLM to reconstruct the experienced situations and mine latent knowledge via reasoning-grounded inference chains on the network.

Abstract

Language models are increasingly used to reason over content they were not trained on, such as new documents, evolving knowledge, and user-specific data. A common approach is retrieval-augmented generation (RAG), which stores verbatim documents externally (as chunks) and retrieves only a relevant subset at inference time for an LLM to reason over. However, this results in inefficient usage of test-time compute (LLM repeatedly reasons over the same documents); moreover, chunk retrieval can inject irrelevant context that increases unsupported generation. We propose a human-like non-parametric continual learning framework, where the base model remains fixed, and learning occurs by integrating each new experience into an external semantic memory state that accumulates and consolidates itself continually. We present Panini, which realizes this by representing documents as Generative Semantic Workspaces (GSW) -- an entity- and event-aware network of question-answer (QA) pairs, sufficient for an LLM to reconstruct the experienced situations and mine latent knowledge via reasoning-grounded inference chains on the network. Given a query, Panini only traverses the continually-updated GSW (not the verbatim documents or chunks), and retrieves the most likely inference chains. Across six QA benchmarks, Panini achieves the highest average performance, 5%-7% higher than other competitive baselines, while using 2-30x fewer answer-context tokens, supports fully open-source pipelines, and reduces unsupported answers on curated unanswerable queries. The results show that efficient and accurate structuring of experiences at write time -- as achieved by the GSW framework -- yields both efficiency and reliability gains at read time. Code is available at https://github.com/roychowdhuryresearch/gsw-memory.
Paper Structure (52 sections, 5 equations, 10 figures, 19 tables, 1 algorithm)

This paper contains 52 sections, 5 equations, 10 figures, 19 tables, 1 algorithm.

Figures (10)

  • Figure 1: A non-parametric continual learning (NPCL) framework schematics (1) Continual experience: incoming documents are processed asynchronously, potentially by different agents. (2) Individual workspaces: each experience is encoded into a Generative Semantic Workspace (GSW). (3) Continually learned global workspace: GSWs can be continually consolidated by reconciling entities, events, and actions both across and within documents. Extensive ablation studies (see Table \ref{['tab:opensource-perf']}) show that different combinations of LLM models of different sizes for performing different tasks -- GSW generation, and retrieval -- lead to consistently robust performance. Thus GSW can be used as a shared meta-representation. (4) Reasoning-grounded inference: The goal is to have enough reconciliation --but not exhaustive-- so that all latent knowledge supported by the collection of experiences are represented by inference chains/paths.
  • Figure 2: System overview of PANINI at inference time. Step 1: Planning: A decomposition LLM converts the user query into an ordered sequence of single-hop sub-questions. Step 2: RICR: We perform chain-based retrieval by expanding candidate paths hop-by-hop. The initial seed set is obtained via embedding similarity; therefore, for a query like "Who was Lothair II's mother?", retrieval may include both Lothair II and the semantically nearby Lothair I. From these seeds, RICR follows QA edges to propose intermediate entities (e.g., candidate mothers) and incrementally extends partial chains across GSWs. Candidate chains are scored at each hop, and low-scoring paths are pruned. Step 3: Answer Generation: Top-ranked chains are de-duplicated and provided to the final answering LLM.
  • Figure 3: Example of a per-document Generative Semantic Workspace (GSW). Top: the raw input passage (title + text). Bottom: the corresponding GSW, rendered as (i) entity nodes annotated with roles and states, and (ii) verb-phrase nodes instantiated as bidirectional question--answer (QA) pairs.
  • Figure 4: Prompt used for factual GSW construction from documents.
  • Figure 5: Prompt used for factual GSW construction from documents (continued).
  • ...and 5 more figures