Table of Contents
Fetching ...

More Than 1v1: Human-AI Alignment in Early Developmental Communities with Multimodal LLMs

Weiyan Shi, Kenny Tsu Wei Choo

TL;DR

It is argued that alignment in developmental settings should be treated as a community-governed process rather than an individual optimisation problem.

Abstract

In early developmental contexts, particularly in parent-child interaction analysis, alignment involves families and professionals such as speech-language pathologists (SLPs) who interpret children's everyday interactions from different roles. When multimodal large language models (MLLMs) are introduced to support this process, alignment becomes a question of how authority, responsibility, and emotional risk are distributed across stakeholders. Through a three-part study with five families and three SLPs, we trace how MLLM-generated outputs move from expert-facing analysis to parent-facing feedback. We propose layered community alignment: grounding representations in expert-aligned structures, mediating translation through professional guardrails, and enabling family-level adaptation within those boundaries. We argue that alignment in developmental settings should be treated as a community-governed process rather than an individual optimisation problem.

More Than 1v1: Human-AI Alignment in Early Developmental Communities with Multimodal LLMs

TL;DR

It is argued that alignment in developmental settings should be treated as a community-governed process rather than an individual optimisation problem.

Abstract

In early developmental contexts, particularly in parent-child interaction analysis, alignment involves families and professionals such as speech-language pathologists (SLPs) who interpret children's everyday interactions from different roles. When multimodal large language models (MLLMs) are introduced to support this process, alignment becomes a question of how authority, responsibility, and emotional risk are distributed across stakeholders. Through a three-part study with five families and three SLPs, we trace how MLLM-generated outputs move from expert-facing analysis to parent-facing feedback. We propose layered community alignment: grounding representations in expert-aligned structures, mediating translation through professional guardrails, and enabling family-level adaptation within those boundaries. We argue that alignment in developmental settings should be treated as a community-governed process rather than an individual optimisation problem.
Paper Structure (14 sections, 2 figures)

This paper contains 14 sections, 2 figures.

Figures (2)

  • Figure 1: Overview of the three-part study. Part I collected naturalistic parent--child interaction videos and generated expert-aligned SLP-facing analyses. Part II involved SLP evaluation and steering of expert outputs. Part III evaluated a parent-facing prototype grounded in practitioner-informed guidance.
  • Figure 2: System Overview. Both interfaces are grounded in the same segment-level MLLM analysis of parent–child interaction (parent--child interaction) videos. SLPView (left) provides structured expert-oriented behavioural assessment with quality labels and aligned timelines. ParentView (right) reframes this analysis into descriptive summaries and supportive suggestions, removing evaluative framing while preserving segment-level continuity.