Integrating Randomness in Large Language Models: A Linear Congruential Generator Approach for Generating Clinically Relevant Content
Andrew Bouras
TL;DR
The paper addresses randomness and repetition in large language model outputs used for educational content generation. It proposes a framework that combines a linear congruential generator (LCG) with GPT-4o to select unique gastrointestinal physiology and pathology facts from a pool of 100, generating seven vignette-style MCQs per round; the LCG recurrence is $X_{n+1} = (a X_n + c) \bmod m$, with parameters $X_0=12345$, $a=1103515245$, $c=12345$, $m=2^{31}$. Over 14 rounds, 98 MCQs were produced with no fact overlap between rounds, demonstrating diverse coverage and high clinical relevance. The approach offers a scalable, reproducible method for automated generation of medical assessment materials and could extend to other domains and question types.
Abstract
Generating diverse, high-quality outputs from language models is crucial for applications in education and content creation. Achieving true randomness and avoiding repetition remains a significant challenge. This study uses the Linear Congruential Generator method for systematic fact selection, combined with AI-powered content generation. We ensured unique combinations of gastrointestinal physiology and pathology facts across multiple rounds, integrating these facts into prompts for GPT-4o to create clinically relevant, vignette-style outputs. Over 14 rounds, 98 unique outputs were generated, demonstrating LCG's effectiveness in producing diverse and high-quality content. This method addresses key issues of randomness and repetition, enhancing the quality and efficiency of language model-generated content for various applications.
