Table of Contents
Fetching ...

Towards an Efficient, Customizable, and Accessible AI Tutor

Juan Segundo Hevia, Facundo Arredondo, Vishesh Kumar

TL;DR

The paper addresses the inequity of access to large, compute-heavy LLMs in education by proposing an offline, customizable AI tutor based on a Retrieval-Augmented Generation (RAG) pipeline that combines a small language model with a local knowledge base. It evaluates the approach in biology education, showing that adding retrieved context can sometimes degrade performance for small models and that high-quality retrieval does not automatically improve results, underscoring the need for better chunking and model adaptation. The authors outline concrete next steps, including semantic, agentic, and meta chunking strategies and exploring quantized or alternative LLMs along with a holistic, free-form evaluation framework. They also discuss practical implications for on-device deployment on smartphones and Raspberry Pi devices to enable offline, equitable access to AI-powered tutoring.

Abstract

The integration of large language models (LLMs) into education offers significant potential to enhance accessibility and engagement, yet their high computational demands limit usability in low-resource settings, exacerbating educational inequities. To address this, we propose an offline Retrieval-Augmented Generation (RAG) pipeline that pairs a small language model (SLM) with a robust retrieval mechanism, enabling factual, contextually relevant responses without internet connectivity. We evaluate the efficacy of this pipeline using domain-specific educational content, focusing on biology coursework. Our analysis highlights key challenges: smaller models, such as SmolLM, struggle to effectively leverage extended contexts provided by the RAG pipeline, particularly when noisy or irrelevant chunks are included. To improve performance, we propose exploring advanced chunking techniques, alternative small or quantized versions of larger models, and moving beyond traditional metrics like MMLU to a holistic evaluation framework assessing free-form response. This work demonstrates the feasibility of deploying AI tutors in constrained environments, laying the groundwork for equitable, offline, and device-based educational tools.

Towards an Efficient, Customizable, and Accessible AI Tutor

TL;DR

The paper addresses the inequity of access to large, compute-heavy LLMs in education by proposing an offline, customizable AI tutor based on a Retrieval-Augmented Generation (RAG) pipeline that combines a small language model with a local knowledge base. It evaluates the approach in biology education, showing that adding retrieved context can sometimes degrade performance for small models and that high-quality retrieval does not automatically improve results, underscoring the need for better chunking and model adaptation. The authors outline concrete next steps, including semantic, agentic, and meta chunking strategies and exploring quantized or alternative LLMs along with a holistic, free-form evaluation framework. They also discuss practical implications for on-device deployment on smartphones and Raspberry Pi devices to enable offline, equitable access to AI-powered tutoring.

Abstract

The integration of large language models (LLMs) into education offers significant potential to enhance accessibility and engagement, yet their high computational demands limit usability in low-resource settings, exacerbating educational inequities. To address this, we propose an offline Retrieval-Augmented Generation (RAG) pipeline that pairs a small language model (SLM) with a robust retrieval mechanism, enabling factual, contextually relevant responses without internet connectivity. We evaluate the efficacy of this pipeline using domain-specific educational content, focusing on biology coursework. Our analysis highlights key challenges: smaller models, such as SmolLM, struggle to effectively leverage extended contexts provided by the RAG pipeline, particularly when noisy or irrelevant chunks are included. To improve performance, we propose exploring advanced chunking techniques, alternative small or quantized versions of larger models, and moving beyond traditional metrics like MMLU to a holistic evaluation framework assessing free-form response. This work demonstrates the feasibility of deploying AI tutors in constrained environments, laying the groundwork for equitable, offline, and device-based educational tools.

Paper Structure

This paper contains 16 sections, 2 figures, 4 tables.

Figures (2)

  • Figure 1: Overview of the AI tutor pipeline.
  • Figure 2: Example of text chunks constructed from MMLU Database