Table of Contents
Fetching ...

Stan: An LLM-based thermodynamics course assistant

Eric M. Furst, Vasudevan Venkateshwaran

TL;DR

Stan, a suite of tools for an undergraduate chemical engineering thermodynamics course built on a data pipeline that develops and deploy in dual roles, describes the design, implementation, and practical failure modes encountered when deploying 7--8 billion parameter models for structured extraction over long lecture transcripts.

Abstract

Discussions of AI in education focus predominantly on student-facing tools -- chatbots, tutors, and problem generators -- while the potential for the same infrastructure to support instructors remains largely unexplored. We describe Stan, a suite of tools for an undergraduate chemical engineering thermodynamics course built on a data pipeline that we develop and deploy in dual roles: serving students and supporting instructors from a shared foundation of lecture transcripts and a structured textbook index. On the student side, a retrieval-augmented generation (RAG) pipeline answers natural-language queries by extracting technical terms, matching them against the textbook index, and synthesizing grounded responses with specific chapter and page references. On the instructor side, the same transcript corpus is processed through structured analysis pipelines that produce per-lecture summaries, identify student questions and moments of confusion, and catalog the anecdotes and analogies used to motivate difficult material -- providing a searchable, semester-scale record of teaching that supports course reflection, reminders, and improvement. All components, including speech-to-text transcription, structured content extraction, and interactive query answering, run entirely on locally controlled hardware using open-weight models (Whisper large-v3, Llama~3.1 8B) with no dependence on cloud APIs, ensuring predictable costs, full data privacy, and reproducibility independent of third-party services. We describe the design, implementation, and practical failure modes encountered when deploying 7--8 billion parameter models for structured extraction over long lecture transcripts, including context truncation, bimodal output distributions, and schema drift, along with the mitigations that resolved them.

Stan: An LLM-based thermodynamics course assistant

TL;DR

Stan, a suite of tools for an undergraduate chemical engineering thermodynamics course built on a data pipeline that develops and deploy in dual roles, describes the design, implementation, and practical failure modes encountered when deploying 7--8 billion parameter models for structured extraction over long lecture transcripts.

Abstract

Discussions of AI in education focus predominantly on student-facing tools -- chatbots, tutors, and problem generators -- while the potential for the same infrastructure to support instructors remains largely unexplored. We describe Stan, a suite of tools for an undergraduate chemical engineering thermodynamics course built on a data pipeline that we develop and deploy in dual roles: serving students and supporting instructors from a shared foundation of lecture transcripts and a structured textbook index. On the student side, a retrieval-augmented generation (RAG) pipeline answers natural-language queries by extracting technical terms, matching them against the textbook index, and synthesizing grounded responses with specific chapter and page references. On the instructor side, the same transcript corpus is processed through structured analysis pipelines that produce per-lecture summaries, identify student questions and moments of confusion, and catalog the anecdotes and analogies used to motivate difficult material -- providing a searchable, semester-scale record of teaching that supports course reflection, reminders, and improvement. All components, including speech-to-text transcription, structured content extraction, and interactive query answering, run entirely on locally controlled hardware using open-weight models (Whisper large-v3, Llama~3.1 8B) with no dependence on cloud APIs, ensuring predictable costs, full data privacy, and reproducibility independent of third-party services. We describe the design, implementation, and practical failure modes encountered when deploying 7--8 billion parameter models for structured extraction over long lecture transcripts, including context truncation, bimodal output distributions, and schema drift, along with the mitigations that resolved them.
Paper Structure (35 sections, 6 figures, 4 tables)

This paper contains 35 sections, 6 figures, 4 tables.

Figures (6)

  • Figure 1: Stan system architecture. The batch pipeline (left) runs on a GPU workstation, producing transcripts and structured lecture analyses. The interactive query pipeline (right) runs on a consumer laptop, using a dual-path extraction strategy (regex and LLM) to match student queries against the textbook index, then synthesizing grounded responses with chapter and lecture context.
  • Figure 2: Example query and generated response from the RAG pipeline. The response is produced by Llama 3.1 8B at temperature $T=0.6$, constrained to use only the five retrieved index entries. All chapter numbers, section titles, and page references in the output are traceable to the retrieved context.
  • Figure 3: Structured summary produced by the lecture analysis pipeline for lecture 9 (introduction to entropy). All fields are extracted automatically from the raw transcript by Llama 3.1 8B in JSON mode. The full JSON output includes additional fields (key equations, source file, model metadata) omitted here for brevity.
  • Figure 4: Questions extracted from lecture 9 (introduction to entropy) by the two-pass pipeline. Each entry includes a timestamp, speaker tag (S = student, I = instructor), and relevance rating. Five of eleven identified questions are shown.
  • Figure 5: Confusion points detected in lecture 9 (introduction to entropy). Each entry includes a timestamp, severity level, topic, and evidence description. Four of six detected points are shown.
  • ...and 1 more figures