Sabiá: Um Chatbot de Inteligência Artificial Generativa para Suporte no Dia a Dia do Ensino Superior
Guilherme Biava Rodrigues, Franciele Beal, Marlon Marcon, Alinne Cristinne Corrêa Souza, André Roberto Ortoncelli, Francisco Carlos Monteiro Souza, Rodolfo Adamshuk Silva
TL;DR
The paper addresses the challenge of fragmented access to university information by introducing Sabiá, a chat-based GenAI system augmented with Retrieval-Augmented Generation (RAG) to retrieve official documents and generate contextually accurate answers. It details a four-phase methodology—requirements, development, data collection, and evaluation—utilizing a LangChain-based RAG pipeline, a vector store (ChromaDB), and multiple LLMs (open-source and proprietary) through OpenRouter. Evaluation with a FAQ dataset and an LLM-as-a-Judge approach reveals that Phi-4, Qwen3-235b, and DeepSeek R1 achieve strong quality scores, while Gemini 2.0 Flash excels in semantic similarity, and GPT-4o-mini offers the best response time. The work emphasizes replicability and open-source deployment to public universities, proposing future empirical usability studies and expanded judging to enhance reliability. Overall, Sabiá demonstrates a practical path to digitizing and democratizing access to academic information through a configurable GenAI+RAG platform.
Abstract
Students often report difficulties in accessing day-to-day academic information, which is usually spread across numerous institutional documents and websites. This fragmentation results in a lack of clarity and causes confusion about routine university information. This project proposes the development of a chatbot using Generative Artificial Intelligence (GenAI) and Retrieval-Augmented Generation (RAG) to simplify access to such information. Several GenAI models were tested and evaluated based on quality metrics and the LLM-as-a-Judge approach. Among them, Gemini 2.0 Flash stood out for its quality and speed, and Gemma 3n for its good performance and open-source nature.
