HACHIMI: Scalable and Controllable Student Persona Generation via Orchestrated Agents

Yilin Jiang; Fei Tan; Xuanyu Yin; Jing Leng; Aimin Zhou

HACHIMI: Scalable and Controllable Student Persona Generation via Orchestrated Agents

Yilin Jiang, Fei Tan, Xuanyu Yin, Jing Leng, Aimin Zhou

TL;DR

HACHIMI is introduced, a multi-agent Propose-Validate-Revise framework that generates theory-aligned, quota-controlled personas and provides a standardized synthetic student population for group-level benchmarking and social-science simulations.

Abstract

Student Personas (SPs) are emerging as infrastructure for educational LLMs, yet prior work often relies on ad-hoc prompting or hand-crafted profiles with limited control over educational theory and population distributions. We formalize this as Theory-Aligned and Distribution-Controllable Persona Generation (TAD-PG) and introduce HACHIMI, a multi-agent Propose-Validate-Revise framework that generates theory-aligned, quota-controlled personas. HACHIMI factorizes each persona into a theory-anchored educational schema, enforces developmental and psychological constraints via a neuro-symbolic validator, and combines stratified sampling with semantic deduplication to reduce mode collapse. The resulting HACHIMI-1M corpus comprises 1 million personas for Grades 1-12. Intrinsic evaluation shows near-perfect schema validity, accurate quotas, and substantial diversity, while external evaluation instantiates personas as student agents answering CEPS and PISA 2022 surveys; across 16 cohorts, math and curiosity/growth constructs align strongly between humans and agents, whereas classroom-climate and well-being constructs are only moderately aligned, revealing a fidelity gradient. All personas are generated with Qwen2.5-72B, and HACHIMI provides a standardized synthetic student population for group-level benchmarking and social-science simulations. Resources available at https://github.com/ZeroLoss-Lab/HACHIMI

HACHIMI: Scalable and Controllable Student Persona Generation via Orchestrated Agents

TL;DR

Abstract

Paper Structure (63 sections, 4 figures, 1 table, 1 algorithm)

This paper contains 63 sections, 4 figures, 1 table, 1 algorithm.

Introduction
Related Work
Classic Student Persona Modeling
Student Persona Generation via LLMs
Student Persona Datasets
HACHIMI Framework
Problem Formulation
Theory-Anchored Persona Schema
Multi-Agent Generation Architecture
Mechanism I: Modular Generation via Shared Whiteboard.
Mechanism II: Neuro-Symbolic Constraint Satisfaction.
Mechanism III: Stratified Sampling and Diversity Control.
The HACHIMI-1M Corpus
Hybrid Semi-Structured Representation.
Distribution via Stratified Sampling.
...and 48 more sections

Figures (4)

Figure 1: HACHIMI pipeline overview. From target distributions (grade/gender/academic level), steps (1)--(5) produce the HACHIMI-1M corpus.
Figure 2: Immersive role-playing prompt template used for HACHIMI student agents when answering CEPS- and PISA-based shadow surveys.
Figure 3: Pearson $r$ and Spearman $\rho$ between human and HACHIMI cohort means for each CEPS target.
Figure 4: Distribution of Pearson correlations between human and agent group means on PISA 2022, summarized by region.

HACHIMI: Scalable and Controllable Student Persona Generation via Orchestrated Agents

TL;DR

Abstract

HACHIMI: Scalable and Controllable Student Persona Generation via Orchestrated Agents

Authors

TL;DR

Abstract

Table of Contents

Figures (4)