Table of Contents
Fetching ...

Synthetic-Child: An AIGC-Based Synthetic Data Pipeline for Privacy-Preserving Child Posture Estimation

Taowen Zeng

TL;DR

Synthetic-Child is presented, an AIGC-based synthetic data pipeline that produces photorealistic child posture training images with ground-truth-projected keypoint annotations, requiring zero real child photographs, and demonstrates that carefully designed AIGC pipelines can substantially reduce dependence on real child imagery while achieving deployment-ready accuracy.

Abstract

Accurate child posture estimation is critical for AI-powered study companion devices, yet collecting large-scale annotated datasets of children is both expensive and ethically prohibitive due to privacy concerns. We present Synthetic-Child, an AIGC-based synthetic data pipeline that produces photorealistic child posture training images with ground-truth-projected keypoint annotations, requiring zero real child photographs. The pipeline comprises four stages: (1) a programmable 3D child body model (SMPL-X) in Blender generates diverse desk-study poses with IK-constrained anatomical plausibility and automatic COCO-format ground-truth export; (2) a custom PoseInjectorNode feeds 3D-derived skeletons into a dual ControlNet (pose + depth) conditioned on FLUX-1 Dev, synthesizing 12,000 photorealistic images across 10 posture categories with low annotation drift; (3) ViTPose-based confidence filtering and targeted augmentation remove generation failures and improve robustness; (4) RTMPose-M (13.6M params) is fine-tuned on the synthetic data and paired with geometric feature engineering and a lightweight MLP for posture classification, then quantized to INT8 for real-time edge deployment. On a real-child test set (n~300), the FP16 model achieves 71.2 AP -- a +12.5 AP improvement over the COCO-pretrained adult-data baseline at identical model capacity. After INT8 quantization the model retains 70.4 AP while running at 22 FPS on a 0.8-TOPS Rockchip RK3568 NPU. In a single-subject controlled comparison with a commercial posture corrector, our system achieves substantially higher recognition rates across most tested categories and responds ~1.8x faster on average. These results demonstrate that carefully designed AIGC pipelines can substantially reduce dependence on real child imagery while achieving deployment-ready accuracy, with potential applications to other privacy-sensitive domains.

Synthetic-Child: An AIGC-Based Synthetic Data Pipeline for Privacy-Preserving Child Posture Estimation

TL;DR

Synthetic-Child is presented, an AIGC-based synthetic data pipeline that produces photorealistic child posture training images with ground-truth-projected keypoint annotations, requiring zero real child photographs, and demonstrates that carefully designed AIGC pipelines can substantially reduce dependence on real child imagery while achieving deployment-ready accuracy.

Abstract

Accurate child posture estimation is critical for AI-powered study companion devices, yet collecting large-scale annotated datasets of children is both expensive and ethically prohibitive due to privacy concerns. We present Synthetic-Child, an AIGC-based synthetic data pipeline that produces photorealistic child posture training images with ground-truth-projected keypoint annotations, requiring zero real child photographs. The pipeline comprises four stages: (1) a programmable 3D child body model (SMPL-X) in Blender generates diverse desk-study poses with IK-constrained anatomical plausibility and automatic COCO-format ground-truth export; (2) a custom PoseInjectorNode feeds 3D-derived skeletons into a dual ControlNet (pose + depth) conditioned on FLUX-1 Dev, synthesizing 12,000 photorealistic images across 10 posture categories with low annotation drift; (3) ViTPose-based confidence filtering and targeted augmentation remove generation failures and improve robustness; (4) RTMPose-M (13.6M params) is fine-tuned on the synthetic data and paired with geometric feature engineering and a lightweight MLP for posture classification, then quantized to INT8 for real-time edge deployment. On a real-child test set (n~300), the FP16 model achieves 71.2 AP -- a +12.5 AP improvement over the COCO-pretrained adult-data baseline at identical model capacity. After INT8 quantization the model retains 70.4 AP while running at 22 FPS on a 0.8-TOPS Rockchip RK3568 NPU. In a single-subject controlled comparison with a commercial posture corrector, our system achieves substantially higher recognition rates across most tested categories and responds ~1.8x faster on average. These results demonstrate that carefully designed AIGC pipelines can substantially reduce dependence on real child imagery while achieving deployment-ready accuracy, with potential applications to other privacy-sensitive domains.
Paper Structure (52 sections, 8 equations, 3 figures, 5 tables)

This paper contains 52 sections, 8 equations, 3 figures, 5 tables.

Figures (3)

  • Figure 1: Overview of the Synthetic-Child pipeline. Stage 1: A child-specific SMPL-X model in Blender generates diverse desk-study poses with IK constraints, exporting rendered images, depth maps, and COCO-format keypoint annotations. Stage 2: A custom PoseInjectorNode feeds ground-truth skeletons into a dual ControlNet (pose + depth) conditioned on FLUX-1 Dev, producing 12,000 photorealistic child images with low annotation drift. Stage 3: Three automated quality gates---confidence filtering (ViTPose-H), spatial fidelity checking, and category consistency verification---remove generation failures and yield 11,900 curated images from an initial pool of 12,000; online augmentation further increases robustness. Stage 4: RTMPose-M is fine-tuned on the synthetic data; geometric feature engineering and a lightweight MLP classify postures; the model is quantized to INT8 for real-time inference on an edge NPU.
  • Figure 2: Illustration of the Stage 1$\to$Stage 2 pipeline for two representative samples viewed from a frontal desk-mounted camera perspective. Row 1: correct upright posture; Row 2: upper body collapsed onto desk (lean_desk). Each row shows: (a) Blender render with overlaid 3D skeleton, (b) depth map rendered from the Blender Z-buffer, (c) COCO-17 OpenPose skeleton deterministically rendered by PoseInjectorNode from the ground-truth JSON, and (d) photorealistic image generated by FLUX-1 conditioned on both control signals (pose + depth). The stark visual contrast between (a) and (d) demonstrates the pipeline's ability to bridge the sim-to-real gap while faithfully preserving the input pose.
  • Figure 3: Qualitative comparison on challenging child postures from the real-child test set, selected for their difficulty under the COCO-pretrained baseline. Each pair shows the baseline output (left: estimated keypoint skeleton only) and the full Synthetic-Child pipeline output (right: keypoints + detection bounding box + MLP classification label with confidence score). (a, b) Two subjects leaning onto the desk surface with ${\geq}$0.97 confidence, a common desk-study posture where children's compressed upper body and foreshortened limbs diverge markedly from adult training distributions. (c) Rightward lateral trunk lean (0.96 confidence). (d) Correct upright posture (0.91 confidence), included to show that the pipeline correctly identifies proper sitting as non-deviant rather than triggering false alerts. The baseline provides keypoint estimation only and cannot classify posture; our pipeline enables category-level posture alerts with high confidence.