Table of Contents
Fetching ...

Campus AI vs. Commercial AI: Comparing How Students and Employees Perceive their University's LLM Chatbot vs. ChatGPT

Leon Hannig, Annika Bush, Meltem Aksoy, Tim Trappen, Steffen Becker, Greta Ontrup

TL;DR

The paper examines how a university-provided customized LLMaaS chatbot differs from the commercial ChatGPT in user trust, perceived privacy, hallucination perceptions, and sustainability-aware use. Grounded in the Trustworthiness Assessment Model, it theorizes that front-end cues (branding, interface) shape user perceptions, potentially creating calibrated or miscalibrated trust. In a field study with 526 participants (including 116 who used both systems), the university chatbot yielded higher trust, lower perceived privacy concerns, and fewer perceived hallucinations than ChatGPT, though objective hallucination benchmarks suggested mixed or higher hallucination tendencies for the customized system. The study highlights the importance of careful cue design, persistent hallucination warnings, and transparency to ensure appropriate user behavior, and it outlines avenues for causal and mechanism-focused research to better align perception with system capabilities. Practically, it provides actionable guidance for deploying LLMaaS in universities to support safe, informed, and sustainable AI use while acknowledging the gap between perception and objective model behavior.

Abstract

As the use of LLM chatbots by students and researchers becomes more prevalent, universities are pressed to develop AI strategies. One strategy that many universities pursue is to customize pre-trained LLM as-a-service (LLMaaS). While most studies on LLMaaS chatbots prioritize technical adaptations, we focus on psychological effects of user-salient customizations, such as interface changes. We assume that such customizations influence users' perception of the system and are therefore important in guiding safe and appropriate use. In a field study, we examine how students and employees (N = 526) at a German university perceive and use their institution's customized LLMaaS chatbot compared to ChatGPT. Participants using both systems (n = 116) reported greater trust, higher perceived privacy and less experienced hallucinations with their university's customized LLMaaS chatbot in contrast to ChatGPT. We discuss theoretical implications for research on calibrated trust, and offer guidance on the design and deployment of LLMaaS chatbots.

Campus AI vs. Commercial AI: Comparing How Students and Employees Perceive their University's LLM Chatbot vs. ChatGPT

TL;DR

The paper examines how a university-provided customized LLMaaS chatbot differs from the commercial ChatGPT in user trust, perceived privacy, hallucination perceptions, and sustainability-aware use. Grounded in the Trustworthiness Assessment Model, it theorizes that front-end cues (branding, interface) shape user perceptions, potentially creating calibrated or miscalibrated trust. In a field study with 526 participants (including 116 who used both systems), the university chatbot yielded higher trust, lower perceived privacy concerns, and fewer perceived hallucinations than ChatGPT, though objective hallucination benchmarks suggested mixed or higher hallucination tendencies for the customized system. The study highlights the importance of careful cue design, persistent hallucination warnings, and transparency to ensure appropriate user behavior, and it outlines avenues for causal and mechanism-focused research to better align perception with system capabilities. Practically, it provides actionable guidance for deploying LLMaaS in universities to support safe, informed, and sustainable AI use while acknowledging the gap between perception and objective model behavior.

Abstract

As the use of LLM chatbots by students and researchers becomes more prevalent, universities are pressed to develop AI strategies. One strategy that many universities pursue is to customize pre-trained LLM as-a-service (LLMaaS). While most studies on LLMaaS chatbots prioritize technical adaptations, we focus on psychological effects of user-salient customizations, such as interface changes. We assume that such customizations influence users' perception of the system and are therefore important in guiding safe and appropriate use. In a field study, we examine how students and employees (N = 526) at a German university perceive and use their institution's customized LLMaaS chatbot compared to ChatGPT. Participants using both systems (n = 116) reported greater trust, higher perceived privacy and less experienced hallucinations with their university's customized LLMaaS chatbot in contrast to ChatGPT. We discuss theoretical implications for research on calibrated trust, and offer guidance on the design and deployment of LLMaaS chatbots.

Paper Structure

This paper contains 65 sections, 6 figures, 15 tables.

Figures (6)

  • Figure 1: Theoretical foundation: System characteristics, customization cues, and user perceptions for the examined university's customized LLMaaS chatbot. Solid lines represent an expected positive association between the construct, dotted lines represent an expected negative association. Customizations colored in red relate to actual system characteristics, while we expect those in blue to solely relate to perceived characteristics. Red-and-blue customizations are expected to have associations with both. Footnotes: ¹Lower temperature may reduce the hallucination tendency by favoring higher-probability tokens and making the output more deterministic Huang_2025.
  • Figure 2: Anonymized replica of the user interface of the university’s customized LLMaaS chatbot. Key elements: (1) university name is displayed; interface follows university's visual design; (2) remaining monthly token contingent shown in percent; (3) no warning regarding potential hallucinations / incorrect outputs.
  • Figure 3: Survey flowchart illustrating participant routing and item blocks based on chatbot usage. Labels at the top of each block indicate the estimated completion time for that section.
  • Figure A1: Percentage of participants using different chatbot systems.
  • Figure C1: Usage frequency of the university’s customized LLMaaS chatbot vs. ChatGPT in the subsample of both systems users ($n = 116$).
  • ...and 1 more figures