Artificial Intelligence in Elementary STEM Education: A Systematic Review of Current Applications and Future Challenges
Majid Memari, Krista Ruggles
TL;DR
This paper addresses the fragmentation and uneven evidence base of AI in elementary STEM education by conducting a PRISMA‑guided review of 258 empirical studies from 2020 to 2025 across eight AI categories. It finds strong, domain‑specific gains for certain AI tools (notably conversational agents in math and science) but identifies persistent gaps in cross‑disciplinary integration, privacy, equity, teacher roles, and curriculum breadth, with most research concentrated in higher elementary grades and in North America, Europe, and East Asia. The authors propose an interoperable, privacy‑preserving, teacher‑centered architectural framework and a staged, ecosystem‑level research agenda that emphasizes AI–human collaboration, stakeholder engagement, and rigorous, cumulatively reportable evaluation standards. They argue that real progress requires moving beyond isolated technology studies to holistic, socio‑technical approaches that preserve human mentorship while leveraging AI to support authentic, career‑connected, and equity‑mfocused STEM learning. The work contributes a detailed map of current capabilities and limitations, practical pilot guidance, and a forward‑looking research program designed to translate AI innovations into scalable, responsible improvements in elementary STEM education.
Abstract
Artificial intelligence (AI) is transforming elementary STEM education, yet evidence remains fragmented. This systematic review synthesizes 258 studies (2020-2025) examining AI applications across eight categories: intelligent tutoring systems (45% of studies), learning analytics (18%), automated assessment (12%), computer vision (8%), educational robotics (7%), multimodal sensing (6%), AI-enhanced extended reality (XR) (4%), and adaptive content generation. The analysis shows that most studies focus on upper elementary grades (65%) and mathematics (38%), with limited cross-disciplinary STEM integration (15%). While conversational AI demonstrates moderate effectiveness (d = 0.45-0.70 where reported), only 34% of studies include standardized effect sizes. Eight major gaps limit real-world impact: fragmented ecosystems, developmental inappropriateness, infrastructure barriers, lack of privacy frameworks, weak STEM integration, equity disparities, teacher marginalization, and narrow assessment scopes. Geographic distribution is also uneven, with 90% of studies originating from North America, East Asia, and Europe. Future directions call for interoperable architectures that support authentic STEM integration, grade-appropriate design, privacy-preserving analytics, and teacher-centered implementations that enhance rather than replace human expertise.
