Table of Contents
Fetching ...

QGen Studio: An Adaptive Question-Answer Generation, Training and Evaluation Platform

Movina Moses, Mohab Elkaref, James Barry, Shinnosuke Tanaka, Vishnudev Kuruvanthodi, Nathan Herr, Campbell D Watson, Geeth De Mel

TL;DR

QGen Studio tackles the challenge of creating high-quality, domain-specific QA datasets by offering an adaptive platform that combines QA data generation with end-to-end model training and evaluation. It integrates multi-LLM generation backends, interactive prompts, a dataset viewer with contextual visualization, and a LoRA-based fine-tuning workflow via MLX, enabling domain adaptation even with limited data. The paper outlines a six-step pipeline from document ingestion to model benchmarking and demonstrates how standardized QA metrics guide data quality and model performance. This open-source platform promises practical impact for researchers and practitioners seeking scalable, domain-tailored QA systems.

Abstract

We present QGen Studio: an adaptive question-answer generation, training, and evaluation platform. QGen Studio enables users to leverage large language models (LLMs) to create custom question-answer datasets and fine-tune models on this synthetic data. It features a dataset viewer and model explorer to streamline this process. The dataset viewer provides key metrics and visualizes the context from which the QA pairs are generated, offering insights into data quality. The model explorer supports model comparison, allowing users to contrast the performance of their trained LLMs against other models, supporting performance benchmarking and refinement. QGen Studio delivers an interactive, end-to-end solution for generating QA datasets and training scalable, domain-adaptable models. The studio will be open-sourced soon, allowing users to deploy it locally.

QGen Studio: An Adaptive Question-Answer Generation, Training and Evaluation Platform

TL;DR

QGen Studio tackles the challenge of creating high-quality, domain-specific QA datasets by offering an adaptive platform that combines QA data generation with end-to-end model training and evaluation. It integrates multi-LLM generation backends, interactive prompts, a dataset viewer with contextual visualization, and a LoRA-based fine-tuning workflow via MLX, enabling domain adaptation even with limited data. The paper outlines a six-step pipeline from document ingestion to model benchmarking and demonstrates how standardized QA metrics guide data quality and model performance. This open-source platform promises practical impact for researchers and practitioners seeking scalable, domain-tailored QA systems.

Abstract

We present QGen Studio: an adaptive question-answer generation, training, and evaluation platform. QGen Studio enables users to leverage large language models (LLMs) to create custom question-answer datasets and fine-tune models on this synthetic data. It features a dataset viewer and model explorer to streamline this process. The dataset viewer provides key metrics and visualizes the context from which the QA pairs are generated, offering insights into data quality. The model explorer supports model comparison, allowing users to contrast the performance of their trained LLMs against other models, supporting performance benchmarking and refinement. QGen Studio delivers an interactive, end-to-end solution for generating QA datasets and training scalable, domain-adaptable models. The studio will be open-sourced soon, allowing users to deploy it locally.

Paper Structure

This paper contains 10 sections, 1 figure.

Figures (1)

  • Figure 1: Overview of QGen Studio