Table of Contents
Fetching ...

Design and testing of an agent chatbot supporting decision making with public transport data

Luca Fantin, Marco Antonelli, Margherita Cesetti, Daniele Irto, Bruno Zamengo, Francesco Silvestri

TL;DR

The paper tackles the challenge of analyzing public transport data, which typically requires SQL proficiency and sophisticated data interpretation for visualization. It proposes an agent-based chatbot that uses a large language model to generate SQL queries, validates them with specialized tools, and renders results and maps from GTFS data. A methodological framework for automatic performance evaluation is introduced, including a workflow that stores queries, retrieved data, and natural-language responses for analysis. Empirical results show partial success, with around 53% of template-driven questions answered correctly, and demonstrate the system's potential while highlighting areas for improvement in complex query generation and real-time data integration. The work contributes to widening accessibility to transport data analysis and sets the stage for future multi-agent architectures and real-time data fusion to improve decision support in transit operations.

Abstract

Assessing the quality of public transportation services requires the analysis of large quantities of data on the scheduled and actual trips and documents listing the quality constraints each service needs to meet. Interrogating such datasets with SQL queries, organizing and visualizing the data can be quite complex for most users. This paper presents a chatbot offering a user-friendly tool to interact with these datasets and support decision making. It is based on an agent architecture, which expands the capabilities of the core Large Language Model (LLM) by allowing it to interact with a series of tools that can execute several tasks, like performing SQL queries, plotting data and creating maps from the coordinates of a trip and its stops. This paper also tackles one of the main open problems of such Generative AI projects: collecting data to measure the system's performance. Our chatbot has been extensively tested with a workflow that asks several questions and stores the generated query, the retrieved data and the natural language response for each of them. Such questions are drawn from a set of base examples which are then completed with actual data from the database. This procedure yields a dataset for the evaluation of the chatbot's performance, especially the consistency of its answers and the correctness of the generated queries.

Design and testing of an agent chatbot supporting decision making with public transport data

TL;DR

The paper tackles the challenge of analyzing public transport data, which typically requires SQL proficiency and sophisticated data interpretation for visualization. It proposes an agent-based chatbot that uses a large language model to generate SQL queries, validates them with specialized tools, and renders results and maps from GTFS data. A methodological framework for automatic performance evaluation is introduced, including a workflow that stores queries, retrieved data, and natural-language responses for analysis. Empirical results show partial success, with around 53% of template-driven questions answered correctly, and demonstrate the system's potential while highlighting areas for improvement in complex query generation and real-time data integration. The work contributes to widening accessibility to transport data analysis and sets the stage for future multi-agent architectures and real-time data fusion to improve decision support in transit operations.

Abstract

Assessing the quality of public transportation services requires the analysis of large quantities of data on the scheduled and actual trips and documents listing the quality constraints each service needs to meet. Interrogating such datasets with SQL queries, organizing and visualizing the data can be quite complex for most users. This paper presents a chatbot offering a user-friendly tool to interact with these datasets and support decision making. It is based on an agent architecture, which expands the capabilities of the core Large Language Model (LLM) by allowing it to interact with a series of tools that can execute several tasks, like performing SQL queries, plotting data and creating maps from the coordinates of a trip and its stops. This paper also tackles one of the main open problems of such Generative AI projects: collecting data to measure the system's performance. Our chatbot has been extensively tested with a workflow that asks several questions and stores the generated query, the retrieved data and the natural language response for each of them. Such questions are drawn from a set of base examples which are then completed with actual data from the database. This procedure yields a dataset for the evaluation of the chatbot's performance, especially the consistency of its answers and the correctness of the generated queries.

Paper Structure

This paper contains 14 sections, 5 figures.

Figures (5)

  • Figure 1: GTFS scheme representing the connections between the most important files (based on an image in Wikimedia Commons); only those mentioned in Section \ref{['sec:base_data']} are included here. Please note that this represents the conceptual links between the files, and should not be interpreted as an ER diagram.
  • Figure 2: High-level representation of the architecture implemented by the chatbot.
  • Figure 3: Simplified version of the textual prompt given to the agent's core LLM.
  • Figure 4: Example of the maps built by the dedicated tool.
  • Figure 5: Example of a test question, with its original template, corresponding example question in the SQL tool's prompt and example, gold and generated queries.