Table of Contents
Fetching ...

Empowering Private Tutoring by Chaining Large Language Models

Yulin Chen, Ning Ding, Hai-Tao Zheng, Zhiyuan Liu, Maosong Sun, Bowen Zhou

TL;DR

This paper tackles the challenge of building a full AI-powered tutoring system capable of long-term, adaptive instruction. It introduces ChatTutor, a memory-augmented, LLM-powered ITS organized around three interacting processes—Interaction, Reflection, and Reaction—with chained tools and structured memories to support dynamic course design and quiz generation. Experimental results and ablation studies indicate that the three-process architecture improves stability, coherence, and personalization in long-term tutoring, while case studies reveal both the potential and limitations (e.g., hallucinations) of LLM-based tutoring. The work demonstrates the viability of memory-augmented, tool-chaining LLM architectures for general-purpose tutoring and outlines future directions such as retrieval-augmented knowledge sources and standardized evaluation metrics.

Abstract

Artificial intelligence has been applied in various aspects of online education to facilitate teaching and learning. However, few approaches has been made toward a complete AI-powered tutoring system. In this work, we explore the development of a full-fledged intelligent tutoring system powered by state-of-the-art large language models (LLMs), covering automatic course planning and adjusting, tailored instruction, and flexible quiz evaluation. To make the system robust to prolonged interaction and cater to individualized education, the system is decomposed into three inter-connected core processes-interaction, reflection, and reaction. Each process is implemented by chaining LLM-powered tools along with dynamically updated memory modules. Tools are LLMs prompted to execute one specific task at a time, while memories are data storage that gets updated during education process. Statistical results from learning logs demonstrate the effectiveness and mechanism of each tool usage. Subjective feedback from human users reveal the usability of each function, and comparison with ablation systems further testify the benefits of the designed processes in long-term interaction.

Empowering Private Tutoring by Chaining Large Language Models

TL;DR

This paper tackles the challenge of building a full AI-powered tutoring system capable of long-term, adaptive instruction. It introduces ChatTutor, a memory-augmented, LLM-powered ITS organized around three interacting processes—Interaction, Reflection, and Reaction—with chained tools and structured memories to support dynamic course design and quiz generation. Experimental results and ablation studies indicate that the three-process architecture improves stability, coherence, and personalization in long-term tutoring, while case studies reveal both the potential and limitations (e.g., hallucinations) of LLM-based tutoring. The work demonstrates the viability of memory-augmented, tool-chaining LLM architectures for general-purpose tutoring and outlines future directions such as retrieval-augmented knowledge sources and standardized evaluation metrics.

Abstract

Artificial intelligence has been applied in various aspects of online education to facilitate teaching and learning. However, few approaches has been made toward a complete AI-powered tutoring system. In this work, we explore the development of a full-fledged intelligent tutoring system powered by state-of-the-art large language models (LLMs), covering automatic course planning and adjusting, tailored instruction, and flexible quiz evaluation. To make the system robust to prolonged interaction and cater to individualized education, the system is decomposed into three inter-connected core processes-interaction, reflection, and reaction. Each process is implemented by chaining LLM-powered tools along with dynamically updated memory modules. Tools are LLMs prompted to execute one specific task at a time, while memories are data storage that gets updated during education process. Statistical results from learning logs demonstrate the effectiveness and mechanism of each tool usage. Subjective feedback from human users reveal the usability of each function, and comparison with ablation systems further testify the benefits of the designed processes in long-term interaction.
Paper Structure (14 sections, 6 figures, 8 tables)

This paper contains 14 sections, 6 figures, 8 tables.

Figures (6)

  • Figure 1: An example of the learning progress. The left side is the user interface directly controlled by the interaction process. The right side is the backend memory changes brought by reflection and reaction processes.
  • Figure 2: An overview of the system's modular implementation and execution in a single round of conversation.
  • Figure 3: A detailed illustration of how course plan is stored and manipulated structurally and how reflection process helps customize the reaction followed.
  • Figure 4: Average output length (calculated by the number of words) and the number of objectives covered in each output for different systems. Average number of objectives are manually annotated with 50 randomly sampled response from each system.
  • Figure 5: Average course plan complexity (calculated by the number of objectives) and update interval (calculated by the number of conversation rounds in between) by course design tool for different systems.$\dag$ means this is the baseline statistics as the system without reflection or reaction processes has a fixed course plan throughout learning.
  • ...and 1 more figures