FhGenie: A Custom, Confidentiality-preserving Chat AI for Corporate and Scientific Use
Ingo Weber, Hendrik Linka, Daniel Mertens, Tamara Muryshkin, Heinrich Opgenoorth, Stefan Langer
TL;DR
The paper addresses data leakage risks and opaque data handling when using public AI chat tools in enterprise research. It presents FhGenie, a confidentiality-preserving chat built on Azure OpenAI Services with strict EU data locality, SSO-based access, and a secure sandbox that complies with GDPR and the Data Act. It details the architecture, development, operation, and lessons from thousands of Fraunhofer staff adopters, including model choices (GPT-3.5 and GPT-4) and performance metrics. The work outlines ongoing and future directions, such as retrieval-augmented generation with internal data, additional modalities, and strategies for model switching to balance cost, latency, and capabilities, offering a practical blueprint for confidential organizational AI deployments.
Abstract
Since OpenAI's release of ChatGPT, generative AI has received significant attention across various domains. These AI-based chat systems have the potential to enhance the productivity of knowledge workers in diverse tasks. However, the use of free public services poses a risk of data leakage, as service providers may exploit user input for additional training and optimization without clear boundaries. Even subscription-based alternatives sometimes lack transparency in handling user data. To address these concerns and enable Fraunhofer staff to leverage this technology while ensuring confidentiality, we have designed and developed a customized chat AI called FhGenie (genie being a reference to a helpful spirit). Within few days of its release, thousands of Fraunhofer employees started using this service. As pioneers in implementing such a system, many other organizations have followed suit. Our solution builds upon commercial large language models (LLMs), which we have carefully integrated into our system to meet our specific requirements and compliance constraints, including confidentiality and GDPR. In this paper, we share detailed insights into the architectural considerations, design, implementation, and subsequent updates of FhGenie. Additionally, we discuss challenges, observations, and the core lessons learned from its productive usage.
