Structural Enforcement of Statistical Rigor in AI-Driven Discovery: A Functional Architecture
Karen Sargsyan
TL;DR
Problem: AI-Scientist systems generate and test many hypotheses, risking spurious discoveries without guaranteed sequential error control. Approach: a functional, two-level architecture— the Research monad enforcing macro-level sequential statistics and Declarative Scaffolding constraining LLM-generated imperative code at the IO boundary. Contributions: a type-class StatisticalProtocol, a LORD++ instance for online FDR control, a DataContract scaffold and harness, large-scale simulation with $N=2000$ hypotheses and an end-to-end case study, providing defense-in-depth across macro and micro levels. Significance: improves integrity and reliability of automated scientific discovery in hybrid AI-Scientist stacks and provides a foundation for extending to parallel settings using STM.
Abstract
Sequential statistical protocols require meticulous state management and robust error handling -- challenges naturally suited to functional programming. We present a functional architecture for structural enforcement of statistical rigor in automated research systems (AI-Scientists). These LLM-driven systems risk generating spurious discoveries through dynamic hypothesis testing. We introduce the Research monad, a Haskell eDSL that enforces sequential statistical protocols (e.g., Online FDR (false discovery rate) control) using a monad transformer stack. To address risks in hybrid architectures where LLMs generate imperative code, we employ Declarative Scaffolding -- generating rigid harnesses that structurally constrain execution and prevent methodological errors like data leakage. We validate this approach through large-scale simulation (N=2000 hypotheses) and an end-to-end case study, demonstrating essential defense-in-depth for automated science integrity.
