Table of Contents
Fetching ...

Instructional Agents: Reducing Teaching Faculty Workload through Multi-Agent Instructional Design

Huaiyuan Yao, Wanpeng Xu, Justin Turnau, Nadia Kellam, Hua Wei

TL;DR

This work addresses the labor-intensive process of designing university-level instructional materials by introducing Instructional Agents, a multi-agent LLM framework that orchestrates role-specific agents (Teaching Faculty, Instructional Designer, Teaching Assistant, Course Coordinator, Program Chair) to generate syllabi, slides, slide scripts, and assessments within an ADDIE-inspired workflow. The system supports four operation modes—from fully autonomous to fully human-in-the-loop—balancing automation with pedagogical oversight. Experimental evaluation across five courses shows that human-in-the-loop modes, especially Full Co-Pilot, yield higher quality, while autonomous mode offers speed and cost benefits; automated evaluators provide limited discrimination, underscoring the need for human review. The approach demonstrates significant potential to reduce faculty workload and enable scalable curriculum development in resource-constrained settings, contributing to more accessible and consistent high-quality education. Limitations include partial coverage of the ADDIE model, LaTeX-compile fragility, and the need for more explicit bias mitigation and accessibility enhancements in deployment.

Abstract

Preparing high-quality instructional materials remains a labor-intensive process that often requires extensive coordination among teaching faculty, instructional designers, and teaching assistants. In this work, we present Instructional Agents, a multi-agent large language model framework designed to automate end-to-end course material generation, including syllabi creation, LaTeX-based slides, lecture scripts, and assessments. Unlike prior tools focused on isolated tasks, Instructional Agents simulates role-based collaboration to ensure pedagogical coherence. The system operates in four modes: Autonomous, Catalog-Guided, Feedback-Guided, and Full Co-Pilot mode, enabling flexible control over the degree of human involvement. We evaluate Instructional Agents across five university-level courses and show that it produces high-quality instructional materials that are reviewed and refined by teaching faculty prior to use, while significantly reducing the time required to prepare classroom-ready content. By supporting institutions with limited instructional design capacity, Instructional Agents provides a scalable and cost-effective framework to democratize access to high-quality education, particularly in underserved or resource-constrained settings. The project website, including source code, is available at https://darl-genai.github. io/instructional_agents_homepage/

Instructional Agents: Reducing Teaching Faculty Workload through Multi-Agent Instructional Design

TL;DR

This work addresses the labor-intensive process of designing university-level instructional materials by introducing Instructional Agents, a multi-agent LLM framework that orchestrates role-specific agents (Teaching Faculty, Instructional Designer, Teaching Assistant, Course Coordinator, Program Chair) to generate syllabi, slides, slide scripts, and assessments within an ADDIE-inspired workflow. The system supports four operation modes—from fully autonomous to fully human-in-the-loop—balancing automation with pedagogical oversight. Experimental evaluation across five courses shows that human-in-the-loop modes, especially Full Co-Pilot, yield higher quality, while autonomous mode offers speed and cost benefits; automated evaluators provide limited discrimination, underscoring the need for human review. The approach demonstrates significant potential to reduce faculty workload and enable scalable curriculum development in resource-constrained settings, contributing to more accessible and consistent high-quality education. Limitations include partial coverage of the ADDIE model, LaTeX-compile fragility, and the need for more explicit bias mitigation and accessibility enhancements in deployment.

Abstract

Preparing high-quality instructional materials remains a labor-intensive process that often requires extensive coordination among teaching faculty, instructional designers, and teaching assistants. In this work, we present Instructional Agents, a multi-agent large language model framework designed to automate end-to-end course material generation, including syllabi creation, LaTeX-based slides, lecture scripts, and assessments. Unlike prior tools focused on isolated tasks, Instructional Agents simulates role-based collaboration to ensure pedagogical coherence. The system operates in four modes: Autonomous, Catalog-Guided, Feedback-Guided, and Full Co-Pilot mode, enabling flexible control over the degree of human involvement. We evaluate Instructional Agents across five university-level courses and show that it produces high-quality instructional materials that are reviewed and refined by teaching faculty prior to use, while significantly reducing the time required to prepare classroom-ready content. By supporting institutions with limited instructional design capacity, Instructional Agents provides a scalable and cost-effective framework to democratize access to high-quality education, particularly in underserved or resource-constrained settings. The project website, including source code, is available at https://darl-genai.github. io/instructional_agents_homepage/

Paper Structure

This paper contains 55 sections, 9 figures, 10 tables.

Figures (9)

  • Figure 1: Overview of Instructional Agents. (Left) Inputs and outputs in Instructional Agents. Educator input and human feedback guide the generation of key instructional materials, including learning objectives, syllabi, slides, slide scripts, and assessments. (Right) Instructional Agents framework showing the overall workflow based on the first three phases of the ADDIE instructional design framework gagne2005principlesbranch2009instructional: Analyze, Design, and Develop. Within each phase, multiple role-specialized agents (Teaching Faculty, Instructional Designer, Teaching Assistant, Course Coordinator, and Program Chair) collaborate through structured prompt exchanges to complete subtasks and refine outputs in an iterative workflow. Appendix \ref{['sec:app:prompt']} provides the specific prompts used for each type of agent. Each prompt includes a tailored background context and clearly defines the agent's goals, tasks, and responsibilities to ensure coherent and role-aligned response generation. Avatars are illustrative and designed for diversity without implying real demographic proportions or stereotypes.
  • Figure 2: Workflow of slide and assessment generation from key points and drafts to final slides, slide scripts, and assessments across the Design and Develop phases.
  • Figure 3: (RQ1) Quality evaluation of generated instructional materials across different model backends with their costs and success rates. This table reports the adapted QM-based rubric scores for course materials generated by Instructional Agents using three LLM backends: gpt-4o, gpt-4o-mini, and o1-preview. The evaluation is on six instructional outputs Instructional Agents generated: Learning Objectives (LO), Syllabi (SY), Assessments (AS), Final Slides (SL), Slide Scripts (SC), and the overall Instructional Package (IP). Scores are averaged over five human evaluators for each of the five courses. Each cell represents a score on a 1--5 Likert scale, where higher is better. gpt-4o-mini achieves a performance level and success rate comparable to gpt-4o and o1-preview, while offering the lowest cost. Detailed numbers are provided in Appendix \ref{['sec:app:backend']}.
  • Figure 4: Comparison of evaluation scores (Human reviewer vs. Automated reviewer). (a) The distribution of scores generated by human reviewer and automated reviewer. (b) The scores of LLMs evaluating their own generated instructional materials. Each cell shows the mean (standard deviation) over five courses. Scores are on a 1--5 scale, where higher is better. (c) The scores of human reviewers evaluating instructional materials generated different LLMs. Human reviewers tend to have more diverse evaluations while automated reviewers tend to give mediocre scores.
  • Figure 5: Radar chart analysis on the performance of generating materials at different modes. Each axis represents scores evaluated by human reviewers on one kind of material. Full Co-Pilot mode consistently performs the best.
  • ...and 4 more figures