Table of Contents
Fetching ...

BambooKG: A Neurobiologically-inspired Frequency-Weight Knowledge Graph

Vanya Arikutharam, Arkadiy Ukolov

TL;DR

BambooKG tackles the challenge of robust, multi-hop reasoning across documents by replacing rigid triplet-centric graphs with a frequency-weighted, non-triplet associative memory graph built from chunk-level tags. It separates memorisation (chunking, tagging, and building a global tag-based graph) from recall (query-tag extraction, subgraph retrieval, and context construction) to form episodic, query-relevant context for an LLM. Empirical results on HotPotQA and MuSiQue show BambooKG achieves higher data recall and faster retrieval than RAG, OpenIE, GraphRAG, and KGGen, due to stronger cross-document connectivity and a retrieval pipeline that avoids heavy embedding-based or triplet-centric constraints. The work presents a neurobiologically inspired alternative for scalable, multi-hop knowledge retrieval, with practical implications for reducing hallucinations and ageing in retrieval-augmented generation.

Abstract

Retrieval-Augmented Generation allows LLMs to access external knowledge, reducing hallucinations and ageing-data issues. However, it treats retrieved chunks independently and struggles with multi-hop or relational reasoning, especially across documents. Knowledge graphs enhance this by capturing the relationships between entities using triplets, enabling structured, multi-chunk reasoning. However, these tend to miss information that fails to conform to the triplet structure. We introduce BambooKG, a knowledge graph with frequency-based weights on non-triplet edges which reflect link strength, drawing on the Hebbian principle of "fire together, wire together". This decreases information loss and results in improved performance on single- and multi-hop reasoning, outperforming the existing solutions.

BambooKG: A Neurobiologically-inspired Frequency-Weight Knowledge Graph

TL;DR

BambooKG tackles the challenge of robust, multi-hop reasoning across documents by replacing rigid triplet-centric graphs with a frequency-weighted, non-triplet associative memory graph built from chunk-level tags. It separates memorisation (chunking, tagging, and building a global tag-based graph) from recall (query-tag extraction, subgraph retrieval, and context construction) to form episodic, query-relevant context for an LLM. Empirical results on HotPotQA and MuSiQue show BambooKG achieves higher data recall and faster retrieval than RAG, OpenIE, GraphRAG, and KGGen, due to stronger cross-document connectivity and a retrieval pipeline that avoids heavy embedding-based or triplet-centric constraints. The work presents a neurobiologically inspired alternative for scalable, multi-hop knowledge retrieval, with practical implications for reducing hallucinations and ageing in retrieval-augmented generation.

Abstract

Retrieval-Augmented Generation allows LLMs to access external knowledge, reducing hallucinations and ageing-data issues. However, it treats retrieved chunks independently and struggles with multi-hop or relational reasoning, especially across documents. Knowledge graphs enhance this by capturing the relationships between entities using triplets, enabling structured, multi-chunk reasoning. However, these tend to miss information that fails to conform to the triplet structure. We introduce BambooKG, a knowledge graph with frequency-based weights on non-triplet edges which reflect link strength, drawing on the Hebbian principle of "fire together, wire together". This decreases information loss and results in improved performance on single- and multi-hop reasoning, outperforming the existing solutions.

Paper Structure

This paper contains 18 sections, 3 figures, 2 tables.

Figures (3)

  • Figure 1: Overall memory pipeline: (a) Chunking, (b) Tag Generation, and (c) Knowledge Graph Creation stages (left - tag to chunk mapping KG and right - tag frequency weight KG)
  • Figure 2: Query Subgraph Generation. For illustrative purposes we take the top 2 first degree neighbours and the top 2 second degree neighbour for each query node. Note: some connections are not shown to ensure the readability of the diagram
  • Figure 3: Stage 2 of the Recall Pipeline: (a) Chunk extraction from query subgraph based on tags obtained from Stage 1 (b) Final Context