Table of Contents
Fetching ...

From Interests to Insights: An LLM Approach to Course Recommendations Using Natural Language Queries

Hugh Van Deventer, Mark Mills, August Evrard

TL;DR

The paper tackles the challenge of course selection on large university campuses by proposing a retrieval-augmented generation pipeline that grounds recommendations in the actual course descriptions. It generates an idealized description from user queries, uses embedding-based retrieval to assemble a contextual set of candidates, and then employs an LLM to produce a final, explainable set of ten course recommendations with confidence scores. Key findings show that embedding space captures meaningful cross-subject relationships, that similarity rank predicts recommendation likelihood, and that the approach can operate without relying on historical enrollment data, making it suitable for pilots within campus platforms like Atlas. The work also analyzes bias patterns and system latency, demonstrating practical viability and outlining deployment considerations, limitations, and future enhancements such as integrating planning capabilities for degree progress.

Abstract

Most universities in the United States encourage their students to explore academic areas before declaring a major and to acquire academic breadth by satisfying a variety of requirements. Each term, students must choose among many thousands of offerings, spanning dozens of subject areas, a handful of courses to take. The curricular environment is also dynamic, and poor communication and search functions on campus can limit a student's ability to discover new courses of interest. To support both students and their advisers in such a setting, we explore a novel Large Language Model (LLM) course recommendation system that applies a Retrieval Augmented Generation (RAG) method to the corpus of course descriptions. The system first generates an 'ideal' course description based on the user's query. This description is converted into a search vector using embeddings, which is then used to find actual courses with similar content by comparing embedding similarities. We describe the method and assess the quality and fairness of some example prompts. Steps to deploy a pilot system on campus are discussed.

From Interests to Insights: An LLM Approach to Course Recommendations Using Natural Language Queries

TL;DR

The paper tackles the challenge of course selection on large university campuses by proposing a retrieval-augmented generation pipeline that grounds recommendations in the actual course descriptions. It generates an idealized description from user queries, uses embedding-based retrieval to assemble a contextual set of candidates, and then employs an LLM to produce a final, explainable set of ten course recommendations with confidence scores. Key findings show that embedding space captures meaningful cross-subject relationships, that similarity rank predicts recommendation likelihood, and that the approach can operate without relying on historical enrollment data, making it suitable for pilots within campus platforms like Atlas. The work also analyzes bias patterns and system latency, demonstrating practical viability and outlining deployment considerations, limitations, and future enhancements such as integrating planning capabilities for degree progress.

Abstract

Most universities in the United States encourage their students to explore academic areas before declaring a major and to acquire academic breadth by satisfying a variety of requirements. Each term, students must choose among many thousands of offerings, spanning dozens of subject areas, a handful of courses to take. The curricular environment is also dynamic, and poor communication and search functions on campus can limit a student's ability to discover new courses of interest. To support both students and their advisers in such a setting, we explore a novel Large Language Model (LLM) course recommendation system that applies a Retrieval Augmented Generation (RAG) method to the corpus of course descriptions. The system first generates an 'ideal' course description based on the user's query. This description is converted into a search vector using embeddings, which is then used to find actual courses with similar content by comparing embedding similarities. We describe the method and assess the quality and fairness of some example prompts. Steps to deploy a pilot system on campus are discussed.
Paper Structure (30 sections, 9 figures, 1 table, 2 algorithms)

This paper contains 30 sections, 9 figures, 1 table, 2 algorithms.

Figures (9)

  • Figure 1: Hybrid LLM-Embedding Pipeline for Course Recommendations
  • Figure 2: Examples of subject-level pairwise connectivity using embedding space representations of all course descriptions within a subject. Edges display cosine similarity measures given in the color bar at right. See text for further description.
  • Figure 3: Final recommendation likelihood as a function of similarity rank in context generation based on 100 total trials comprised of 10 iterations of 10 specific queries.
  • Figure 4: Two examples of recommender output to queries given at the top of each panel. The left panel is limited to undergraduate (400-level and below) courses while the right panel allows 500-level graduate courses to be considered.
  • Figure 5: Recommendation likelihood comparison for different query pairs across sexuality (a), race (b), and birth sex (c)
  • ...and 4 more figures