Table of Contents
Fetching ...

TAMUSA-Chat: A Domain-Adapted Large Language Model Conversational System for Research and Responsible Deployment

Izzat Alsmadi, Anas Alsobeh

TL;DR

This paper presents TAMUSA-Chat, a research-oriented framework for building domain-adapted large language model conversational systems and demonstrates how academic institutions can develop contextually grounded conversational agents while maintaining transparency, governance compliance, and responsible AI practices.

Abstract

This paper presents TAMUSA-Chat, a research-oriented framework for building domain-adapted large language model conversational systems. The work addresses critical challenges in adapting general-purpose foundation models to institutional contexts through supervised fine-tuning, retrieval-augmented generation, and systematic evaluation methodologies. We describe the complete architecture encompassing data acquisition from institutional sources, preprocessing pipelines, embedding construction, model training workflows, and deployment strategies. The system integrates modular components enabling reproducible experimentation with training configurations, hyper-parameters, and evaluation protocols. Our implementation demonstrates how academic institutions can develop contextually grounded conversational agents while maintaining transparency, governance compliance, and responsible AI practices. Through empirical analysis of fine-tuning behavior across model sizes and training iterations, we provide insights into domain adaptation efficiency, computational resource requirements, and quality-cost trade-offs. The publicly available codebase at https://github.com/alsmadi/TAMUSA_LLM_Based_Chat_app supports continued research into institutional LLM deployment, evaluation methodologies, and ethical considerations for educational AI systems.

TAMUSA-Chat: A Domain-Adapted Large Language Model Conversational System for Research and Responsible Deployment

TL;DR

This paper presents TAMUSA-Chat, a research-oriented framework for building domain-adapted large language model conversational systems and demonstrates how academic institutions can develop contextually grounded conversational agents while maintaining transparency, governance compliance, and responsible AI practices.

Abstract

This paper presents TAMUSA-Chat, a research-oriented framework for building domain-adapted large language model conversational systems. The work addresses critical challenges in adapting general-purpose foundation models to institutional contexts through supervised fine-tuning, retrieval-augmented generation, and systematic evaluation methodologies. We describe the complete architecture encompassing data acquisition from institutional sources, preprocessing pipelines, embedding construction, model training workflows, and deployment strategies. The system integrates modular components enabling reproducible experimentation with training configurations, hyper-parameters, and evaluation protocols. Our implementation demonstrates how academic institutions can develop contextually grounded conversational agents while maintaining transparency, governance compliance, and responsible AI practices. Through empirical analysis of fine-tuning behavior across model sizes and training iterations, we provide insights into domain adaptation efficiency, computational resource requirements, and quality-cost trade-offs. The publicly available codebase at https://github.com/alsmadi/TAMUSA_LLM_Based_Chat_app supports continued research into institutional LLM deployment, evaluation methodologies, and ethical considerations for educational AI systems.
Paper Structure (13 sections, 3 figures, 2 tables)

This paper contains 13 sections, 3 figures, 2 tables.

Figures (3)

  • Figure 1: Overview of the TAMUSA‑Chat pipeline. Data acquisition crawls institutional websites and documents, processing pipelines convert content into structured corpora, embeddings are indexed for retrieval, models are fine‑tuned on generated instruction–response pairs and combined with retrieval for inference, and utilities orchestrate evaluation and deployment.
  • Figure 2: Simplified inference script for TAMUSA-Chat.
  • Figure 3: CAMSA Chat-bot Example