RAM2C: A Liberal Arts Educational Chatbot based on Retrieval-augmented Multi-role Multi-expert Collaboration

Haoyu Huang; Tong Niu; Rui Yang; Luping Shi

RAM2C: A Liberal Arts Educational Chatbot based on Retrieval-augmented Multi-role Multi-expert Collaboration

Haoyu Huang, Tong Niu, Rui Yang, Luping Shi

TL;DR

Empirical evaluations indicate that RM2C-empowered LLMs excel in Chinese reading teaching, offering more personalized, and ethically safe teaching response, demonstrating RAM2C's practicality and high quality.

Abstract

Recently, many studies focus on utilizing large language models (LLMs) into educational dialogues. Especially, within liberal arts dialogues, educators must balance \textbf{H}umanized communication, \textbf{T}eaching expertise, and \textbf{S}afety-ethics (\textbf{HTS}), besides the subject knowledge itself. However, due to collecting massive amounts of HTS-compliant teaching dialogues from real world as training corpus is expensive, the outputs of existing LLMs in teaching dialogues fall short of human standards. To address this, we design a Retrieval-augmented Multi-role Multi-expert Collaboration (RAM2C) framework to automatically generate such dialogues data. Specifically, we first establish HTS-guided knowledge bases, encompassing three domain knowledge in teaching skills, psychology, and safety ethics. Then, RAM2C organizes LLMs, which are retrieval-augmented by the above different knowledge bases, into multi-experts groups with distinct roles to generate the HTS-compliant educational dialogues dataset. We then fine-tuned the LLMs using this dataset. Empirical evaluations indicate that RM2C-empowered LLMs excel in Chinese reading teaching, offering more personalized, and ethically safe teaching response, demonstrating RAM2C's practicality and high quality. We release the experiments at \hyperlink{https://github.com/ram2c/ram2c}{https://github.com/ram2c/ram2c}.

RAM2C: A Liberal Arts Educational Chatbot based on Retrieval-augmented Multi-role Multi-expert Collaboration

TL;DR

Abstract

Paper Structure (36 sections, 5 figures, 5 tables)

This paper contains 36 sections, 5 figures, 5 tables.

Introduction
Methodology
Multi-role Multi-expert Collaboration
Retrieval Augmented Experts
Experiments
Experimental Setup
Scenario settings.
Multi-source knowledge base.
Model Fine-tuning
Evaluation Set Construction
Evaluation Results
Ablation studies.
Case Study
Conclusion
Limitations
...and 21 more sections

Figures (5)

Figure 1: HTS: Multi-dimensional educational dialogue quality challenges.
Figure 2: The design of Multi-role Multi-expert Collaboration (M2C). a) Experts with different roles are gathered. The raw response from basic LLM are revised sequentially by T-Group (step 1), P-Group (step 2) and E-Group (step 3). All LLM experts in different roles are characterized by different personal profiles and retrieval augmented by different HTS knowledge bases. b) In a single-role collaboration, the raw response, the current discussion topic and the student context are concatenated as the context of the refinement. Experts initially conduct individual analyses, thereafter synthesize their insights into one modification. The final response from the third group will be relayed to students. c) Educational preference data is collected from the output of M2C procedure. The LLM use these preference data to improve its intrinsic capability using direct preference optimization (DPO) algorithm.
Figure 3: A schematic diagram of retrieval augmented experts, using the T-Group as an example. The revision of a raw response from the basic LLM is generated through proactive analysis of the student context and the accepted documents. The documents are retrieved from a multi-source knowledge base and subsequently filtered through group reflection, that is, the multi-dimensional value assessments of the retrieved documents.
Figure 4: Grading of retrieval documents by deep sentence embedding model bge-reranker-v2-m3 and group of LLM experts. a) Top: RAM2C starts a topic.Bottom: a student gives the answer. b) Retrieval documents #4, #5 and #15 according to the topic and answer. Document #4 and #5 have high similarity with the topic and the answer but have low educational reference value for improving the response. While the document #15 is actually the high-value reference which could inspire the analysis of similar topic. c) From top to bottom: voting scores of documents #0 - #17 by 7, 5, 3 teacher experts, similarity scores between the answer and documents, similarity scores between the topic and documents by the bge-reranker-v2-m3.
Figure 5: A well structured response by the fine-tuned Qwen1.5-4B model and some negative cases generated by traditional LLMs. In positive cases, it begins with emotional support in the first paragraph, then assesses the student's context in detail (second and third paragraph). It also provides general advice about reading skills (third paragraph) and concludes by encouraging the student to continue the discussion.

RAM2C: A Liberal Arts Educational Chatbot based on Retrieval-augmented Multi-role Multi-expert Collaboration

TL;DR

Abstract

RAM2C: A Liberal Arts Educational Chatbot based on Retrieval-augmented Multi-role Multi-expert Collaboration

Authors

TL;DR

Abstract

Table of Contents

Figures (5)