Table of Contents
Fetching ...

KamerRaad: Enhancing Information Retrieval in Belgian National Politics through Hierarchical Summarization and Conversational Interfaces

Alexander Rogiers, Maarten Buyl, Bo Kang, Tijl De Bie

TL;DR

KamerRaad addresses the challenge of making extensive and heterogeneous parliamentary records accessible to citizens by using hierarchical summarization within a Retrieval-Augmented Generation framework to condense long documents into context-friendly chunks while preserving source provenance. It introduces metadata tagging and a two-level summarization approach (comprehensive and one-line) to optimize LLM prompts and enable source-grounded, conversational dialogue via a Streamlit UI and open-source back-end models. The system ranks relevant chunks with cosine similarity and maintains direct links to source documents, facilitating traceability for policymakers and the public alike. By combining hierarchical summarization with source-driven dialogue, KamerRaad enhances accessibility, transparency, and trust in political information, potentially improving democratic participation and informed decision-making.

Abstract

KamerRaad is an AI tool that leverages large language models to help citizens interactively engage with Belgian political information. The tool extracts and concisely summarizes key excerpts from parliamentary proceedings, followed by the potential for interaction based on generative AI that allows users to steadily build up their understanding. KamerRaad's front-end, built with Streamlit, facilitates easy interaction, while the back-end employs open-source models for text embedding and generation to ensure accurate and relevant responses. By collecting feedback, we intend to enhance the relevancy of our source retrieval and the quality of our summarization, thereby enriching the user experience with a focus on source-driven dialogue.

KamerRaad: Enhancing Information Retrieval in Belgian National Politics through Hierarchical Summarization and Conversational Interfaces

TL;DR

KamerRaad addresses the challenge of making extensive and heterogeneous parliamentary records accessible to citizens by using hierarchical summarization within a Retrieval-Augmented Generation framework to condense long documents into context-friendly chunks while preserving source provenance. It introduces metadata tagging and a two-level summarization approach (comprehensive and one-line) to optimize LLM prompts and enable source-grounded, conversational dialogue via a Streamlit UI and open-source back-end models. The system ranks relevant chunks with cosine similarity and maintains direct links to source documents, facilitating traceability for policymakers and the public alike. By combining hierarchical summarization with source-driven dialogue, KamerRaad enhances accessibility, transparency, and trust in political information, potentially improving democratic participation and informed decision-making.

Abstract

KamerRaad is an AI tool that leverages large language models to help citizens interactively engage with Belgian political information. The tool extracts and concisely summarizes key excerpts from parliamentary proceedings, followed by the potential for interaction based on generative AI that allows users to steadily build up their understanding. KamerRaad's front-end, built with Streamlit, facilitates easy interaction, while the back-end employs open-source models for text embedding and generation to ensure accurate and relevant responses. By collecting feedback, we intend to enhance the relevancy of our source retrieval and the quality of our summarization, thereby enriching the user experience with a focus on source-driven dialogue.
Paper Structure (8 sections, 2 figures)

This paper contains 8 sections, 2 figures.

Figures (2)

  • Figure 1: KamerRaad UI displaying the entire user flow for the example question "Who's in favor of building more nuclear power plants?". Starting from 1. a query input field with suggested questions to 2. relevant summaries with the options to 3. generate a response that clarifies how the source answers the question, 4. view the complete source document and 5. give explicit about the retrieval and generation. The text in this figure has been translated to English to be accessible to the reader.
  • Figure 2: KamerRaad processing pipeline. During pre-processing we scrape and chunk raw documents. During tagging and summarization we enhance each chunk with a full summary, a short summary, politician and topic. This metadata is represented as colors for the politicians and symbols for the topic. At runtime the user is presented first with document summaries relevant to their query. Relevance is calculated as cosine similarity between the prompt embedding and the summary embedding. The generative model provides a response when the user shows interest in the sources by interacting with the UI. A direct link to the source chunks is always maintained in the response as visualized by the colors and topic reoccurring in the speech bubble.