Table of Contents
Fetching ...

FernUni LLM Experimental Infrastructure (FLEXI) -- Enabling Experimentation and Innovation in Higher Education Through Access to Open Large Language Models

Torsten Zesch, Michael Hanses, Niels Seidel, Piush Aggarwal, Dirk Veiel, Claudia de Witt

TL;DR

The paper addresses the challenge of access to LLMs in higher education by advocating on-prem open-source LLMs with FLEXI, an experimental infrastructure at FernUniversität in Hagen. It outlines a Bare-metal Kubernetes-based architecture, software choices (Ollama, OpenWebUI, Traefik), model-selection criteria (openness, language, quality, safety, size), and data-protection advantages of staying within university networks. Through load testing and cost analysis, FLEXI demonstrates feasibility, quantifies energy use, and discusses trade-offs between model quality and throughput, highlighting operational and governance considerations. The work also demonstrates practical uses, including a Chat interface, RAG over course materials, API access, and Moodle integration via Caipi, while outlining future work for scaling, uptime guarantees, GDPR compliance, and ethical governance. Overall, FLEXI provides a concrete reference for universities considering on-prem LLM deployment and emphasizes evidence-gathering to guide decision-making about model selection and scale.

Abstract

Using the full potential of LLMs in higher education is hindered by challenges with access to LLMs. The two main access modes currently discussed are paying for a cloud-based LLM or providing a locally maintained open LLM. In this paper, we describe the current state of establishing an open LLM infrastructure at FernUniversität in Hagen under the project name FLEXI (FernUni LLM Experimental Infrastructure). FLEXI enables experimentation within teaching and research with the goal of generating strongly needed evidence in favor (or against) the use of locally maintained open LLMs in higher education. The paper will provide some practical guidance for everyone trying to decide whether to run their own LLM server.

FernUni LLM Experimental Infrastructure (FLEXI) -- Enabling Experimentation and Innovation in Higher Education Through Access to Open Large Language Models

TL;DR

The paper addresses the challenge of access to LLMs in higher education by advocating on-prem open-source LLMs with FLEXI, an experimental infrastructure at FernUniversität in Hagen. It outlines a Bare-metal Kubernetes-based architecture, software choices (Ollama, OpenWebUI, Traefik), model-selection criteria (openness, language, quality, safety, size), and data-protection advantages of staying within university networks. Through load testing and cost analysis, FLEXI demonstrates feasibility, quantifies energy use, and discusses trade-offs between model quality and throughput, highlighting operational and governance considerations. The work also demonstrates practical uses, including a Chat interface, RAG over course materials, API access, and Moodle integration via Caipi, while outlining future work for scaling, uptime guarantees, GDPR compliance, and ethical governance. Overall, FLEXI provides a concrete reference for universities considering on-prem LLM deployment and emphasizes evidence-gathering to guide decision-making about model selection and scale.

Abstract

Using the full potential of LLMs in higher education is hindered by challenges with access to LLMs. The two main access modes currently discussed are paying for a cloud-based LLM or providing a locally maintained open LLM. In this paper, we describe the current state of establishing an open LLM infrastructure at FernUniversität in Hagen under the project name FLEXI (FernUni LLM Experimental Infrastructure). FLEXI enables experimentation within teaching and research with the goal of generating strongly needed evidence in favor (or against) the use of locally maintained open LLMs in higher education. The paper will provide some practical guidance for everyone trying to decide whether to run their own LLM server.
Paper Structure (23 sections, 4 figures, 5 tables)

This paper contains 23 sections, 4 figures, 5 tables.

Figures (4)

  • Figure 1: The Flexi approach replacing a cloud-based LLM with a locally maintained open LLM
  • Figure 2: Usage spikes during a longer period of use of Server (A). The memory usage of all 8 GPUs is shown in color. You can see that usually, all 8 GPUs are used. However, as soon as it is sufficient to use only one GPU, e.g., for a smaller model (see smaller red spikes), the system implements this accordingly and in a resource-saving manner.
  • Figure 3: Load test results accessing Server B. We show results for a single query as well as 10 and 30 concurrent queries (CQ). There is a response timeout of 300s, so this is the maximum possible average.
  • Figure 4: Screenshot of an example session with the OpenWebUI chat frontend using the Llamma3-70b model.