Satyrn: A Platform for Analytics Augmented Generation

Marko Sterbentz; Cameron Barrie; Shubham Shahi; Abhratanu Dutta; Donna Hooshmand; Harper Pack; Kristian J. Hammond

Satyrn: A Platform for Analytics Augmented Generation

Marko Sterbentz, Cameron Barrie, Shubham Shahi, Abhratanu Dutta, Donna Hooshmand, Harper Pack, Kristian J. Hammond

TL;DR

This work presents a neurosymbolic platform, Satyrn, that leverages AAG to produce accurate, fluent, and coherent reports grounded in large scale databases and finds that Satyrn generates reports in which over 86% of claims are accurate while maintaining high levels of fluency and coherence.

Abstract

Large language models (LLMs) are capable of producing documents, and retrieval augmented generation (RAG) has shown itself to be a powerful method for improving accuracy without sacrificing fluency. However, not all information can be retrieved from text. We propose an approach that uses the analysis of structured data to generate fact sets that are used to guide generation in much the same way that retrieved documents are used in RAG. This analytics augmented generation (AAG) approach supports the ability to utilize standard analytic techniques to generate facts that are then converted to text and passed to an LLM. We present a neurosymbolic platform, Satyrn, that leverages AAG to produce accurate, fluent, and coherent reports grounded in large scale databases. In our experiments, we find that Satyrn generates reports in which over 86% of claims are accurate while maintaining high levels of fluency and coherence, even when using smaller language models such as Mistral-7B, as compared to GPT-4 Code Interpreter in which just 57% of claims are accurate.

Satyrn: A Platform for Analytics Augmented Generation

TL;DR

Abstract

Paper Structure (40 sections, 18 figures, 4 tables)

This paper contains 40 sections, 18 figures, 4 tables.

Introduction
Methods
Satyrn Rings
Structured Question Representation (SQR)
SQR Plan Templates
Analysis Engine
Generation of Factual Statements
Domain-Agnostic Report Blueprints
Execution of a Report Blueprint
Experiments
Report Generation Modes
Report Evaluation
Results
Factual Accuracy
Fluency and Coherence
...and 25 more sections

Figures (18)

Figure 1: The high level approach of Satyrn and its analytics augmented generation.
Figure 2: An example of a Satyrn ring with two entities defined: State and Wildfire.
Figure 3: A SQR plan, in textual and graph forms, for determining average wildfire size for each state in 2020.
Figure 4: An instantiation of a report blueprint for ranking California against other states by average wildfire size. The resulting SQR plans will be executed and their results are inserted into the prompt to form the input to the LLM.
Figure 5: The fraction of claims classified as factual, rather than confabulated or refuted.
...and 13 more figures

Satyrn: A Platform for Analytics Augmented Generation

TL;DR

Abstract

Satyrn: A Platform for Analytics Augmented Generation

Authors

TL;DR

Abstract

Table of Contents

Figures (18)