CAISSON: Concept-Augmented Inference Suite of Self-Organizing Neural Networks

Igor Halperin

CAISSON: Concept-Augmented Inference Suite of Self-Organizing Neural Networks

Igor Halperin

TL;DR

CAISSON addresses limitations of single-vector RAG by introducing a dual-path Self-Organizing Map architecture that supports multi-view clustering over semantic-entity and conceptual axes. It combines transformer-based embeddings with classical SOMs, enabling parallel semantic and conceptual retrieval that is fused at query time. The SynFAQA framework provides a controlled, domain-specific benchmark with synthetic analyst notes and 20,000 Q/A pairs spanning single-hop and multi-hop reasoning, including bridge elements. Empirical results show substantial retrieval gains—MRR of $0.5231$ vs $0.2106$ for the baseline and strong performance across up to four tickers with sub-second latency—highlighting CAISSON's practical potential for enterprise financial information access.

Abstract

We present CAISSON, a novel hierarchical approach to Retrieval-Augmented Generation (RAG) that transforms traditional single-vector search into a multi-view clustering framework. At its core, CAISSON leverages dual Self-Organizing Maps (SOMs) to create complementary organizational views of the document space, where each view captures different aspects of document relationships through specialized embeddings. The first view processes combined text and metadata embeddings, while the second operates on metadata enriched with concept embeddings, enabling a comprehensive multi-view analysis that captures both fine-grained semantic relationships and high-level conceptual patterns. This dual-view approach enables more nuanced document discovery by combining evidence from different organizational perspectives. To evaluate CAISSON, we develop SynFAQA, a framework for generating synthetic financial analyst notes and question-answer pairs that systematically tests different aspects of information retrieval capabilities. Drawing on HotPotQA's methodology for constructing multi-step reasoning questions, SynFAQA generates controlled test cases where each question is paired with the set of notes containing its ground-truth answer, progressing from simple single-entity queries to complex multi-hop retrieval tasks involving multiple entities and concepts. Our experimental results demonstrate substantial improvements over both basic and enhanced RAG implementations, particularly for complex multi-entity queries, while maintaining practical response times suitable for interactive applications.

CAISSON: Concept-Augmented Inference Suite of Self-Organizing Neural Networks

TL;DR

Abstract

CAISSON: Concept-Augmented Inference Suite of Self-Organizing Neural Networks

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (7)