Table of Contents
Fetching ...

Toward Inclusive Educational AI: Auditing Frontier LLMs through a Multiplexity Lens

Abdullah Mushtaq, Muhammad Rafay Naeem, Muhammad Imran Taj, Ibrahim Ghaznavi, Junaid Qadir

TL;DR

As mitigation strategies evolve from contextual prompting to MAS-implementation, cultural inclusivity markedly improves, evidenced by a significant rise in the Perspectives Distribution Score (PDS) and a PDS Entropy increase from 3.25% at baseline to 98% with the MAS-Implemented Multiplex LLMs.

Abstract

As large language models (LLMs) like GPT-4 and Llama 3 become integral to educational contexts, concerns are mounting over the cultural biases, power imbalances, and ethical limitations embedded within these technologies. Though generative AI tools aim to enhance learning experiences, they often reflect values rooted in Western, Educated, Industrialized, Rich, and Democratic (WEIRD) cultural paradigms, potentially sidelining diverse global perspectives. This paper proposes a framework to assess and mitigate cultural bias within LLMs through the lens of applied multiplexity. Multiplexity, inspired by Senturk et al. and rooted in Islamic and other wisdom traditions, emphasizes the coexistence of diverse cultural viewpoints, supporting a multi-layered epistemology that integrates both empirical sciences and normative values. Our analysis reveals that LLMs frequently exhibit cultural polarization, with biases appearing in both overt responses and subtle contextual cues. To address inherent biases and incorporate multiplexity in LLMs, we propose two strategies: \textit{Contextually-Implemented Multiplex LLMs}, which embed multiplex principles directly into the system prompt, influencing LLM outputs at a foundational level and independent of individual prompts, and \textit{Multi-Agent System (MAS)-Implemented Multiplex LLMs}, where multiple LLM agents, each representing distinct cultural viewpoints, collaboratively generate a balanced, synthesized response. Our findings demonstrate that as mitigation strategies evolve from contextual prompting to MAS-implementation, cultural inclusivity markedly improves, evidenced by a significant rise in the Perspectives Distribution Score (PDS) and a PDS Entropy increase from 3.25\% at baseline to 98\% with the MAS-Implemented Multiplex LLMs. Sentiment analysis further shows a shift towards positive sentiment across cultures,...

Toward Inclusive Educational AI: Auditing Frontier LLMs through a Multiplexity Lens

TL;DR

As mitigation strategies evolve from contextual prompting to MAS-implementation, cultural inclusivity markedly improves, evidenced by a significant rise in the Perspectives Distribution Score (PDS) and a PDS Entropy increase from 3.25% at baseline to 98% with the MAS-Implemented Multiplex LLMs.

Abstract

As large language models (LLMs) like GPT-4 and Llama 3 become integral to educational contexts, concerns are mounting over the cultural biases, power imbalances, and ethical limitations embedded within these technologies. Though generative AI tools aim to enhance learning experiences, they often reflect values rooted in Western, Educated, Industrialized, Rich, and Democratic (WEIRD) cultural paradigms, potentially sidelining diverse global perspectives. This paper proposes a framework to assess and mitigate cultural bias within LLMs through the lens of applied multiplexity. Multiplexity, inspired by Senturk et al. and rooted in Islamic and other wisdom traditions, emphasizes the coexistence of diverse cultural viewpoints, supporting a multi-layered epistemology that integrates both empirical sciences and normative values. Our analysis reveals that LLMs frequently exhibit cultural polarization, with biases appearing in both overt responses and subtle contextual cues. To address inherent biases and incorporate multiplexity in LLMs, we propose two strategies: \textit{Contextually-Implemented Multiplex LLMs}, which embed multiplex principles directly into the system prompt, influencing LLM outputs at a foundational level and independent of individual prompts, and \textit{Multi-Agent System (MAS)-Implemented Multiplex LLMs}, where multiple LLM agents, each representing distinct cultural viewpoints, collaboratively generate a balanced, synthesized response. Our findings demonstrate that as mitigation strategies evolve from contextual prompting to MAS-implementation, cultural inclusivity markedly improves, evidenced by a significant rise in the Perspectives Distribution Score (PDS) and a PDS Entropy increase from 3.25\% at baseline to 98\% with the MAS-Implemented Multiplex LLMs. Sentiment analysis further shows a shift towards positive sentiment across cultures,...
Paper Structure (30 sections, 3 equations, 5 figures, 3 tables)

This paper contains 30 sections, 3 equations, 5 figures, 3 tables.

Figures (5)

  • Figure 1: Overview of a multiplexity-inspired framework for assessing cultural bias in LLM-generated educational content: a two-stage process analyzing cultural distribution across topics and using sentiment analysis to identify biases for a holistic view of cultural alignment.
  • Figure 2: System design for utilizing contextually-implemented multiplex LLMs to enrich baseline LLM. Baseline Assessment: LLMs respond to educational questions, and their responses are analyzed by the Perspective Extractor to identify cultural references. The Perspective Distribution Score (PDS) then quantifies the cultural representation within these responses. Contextually-implemented Multiplexity: LLMs answer similar questions with Multiplexity prompts, designed to incorporate diverse cultural perspectives. The Perspective Extractor analyzes these responses, and PDS scores are calculated to assess cultural integration. Sentiment Analysis: The sentiment and tone of LLM responses for each culture are assessed through zero-shot classification by GPT-4o, enabling bias framing across cultural perspectives.
  • Figure 3: System design for utilizing a Multi-Agent System to enrich LLM responses with multicultural perspectives. 1) Questions on educational topics are sent to the Coordinator Agent. 2) The Coordinator forwards them to the Tasks Agent to generate a task list. 3) The Tasks Agent returns the task list to the Tasks Channel. 4) Tasks are sent to the Tasks Channel. 5) Tasks get assigned to relevant agents (excluding the Multiplex Agent) based on their personas. 6) Agents generate outputs, sending them to the Tasks Channel and Coordinator. 7) The Coordinator sends all outputs to the Multiplex Agent, which applies Multiplexity rules for a multicultural output. 8) The Perspectives Extractor identifies cultural references in this output. 9) The PDS is calculated from these references. 10) A Sentiment Analyzer assesses sentiment toward each culture and results are reviewed to evaluate bias framing.
  • Figure 4: PDS for the baseline LLMs, and the two multiplex adaptations of these LLMs: the contextually-implemented multiplex models and the MAS-implemented multiplex models. It is clear that multiplex models are more inclusive with MAS-implemented multiplex models performing better. A lack of diversity in the baseline models is also visible.
  • Figure 5: Sentiment Analysis for the baseline LLM, and the two multiplex adaptations of these LLMs: the contextually-implemented multiplex models and the MAS-implemented multiplex models. A higher proportion of positive sentiment reflects the strategy's supportive stance across cultures, enhancing its effectiveness. Multiplex LLMs outperform baseline models, with the MAS-implemented version achieving the highest accuracy in sentiment analysis.