Correctness Learning: Deductive Verification Guided Learning for Human-AI Collaboration
Zhao Jin, Lu Jin, Yizhe Luo, Shuo Feng, Yucheng Shi, Kai Zheng, Xinde Yu, Mingliang Xu
TL;DR
This work addresses the challenge of ensuring correctness in AI-assisted decision systems by marrying deductive verification with learning. It introduces pattern-driven correctness learning (PDCL), a two-stage framework where hierarchical reinforcement learning generates output schemes and formal deductive verification via separation logic and Coq appraises their conformity to historical correctness patterns, guiding the learner through pattern-matching rewards. The approach is instantiated for a job scheduling domain using a two-tier resource heap model and a modeling language ML_JSS, with Coq-based formal development verifying both historical and generated schemes. Empirical results across multiple scenarios show that PDCL enhances decision quality and resource utilization for several RL baselines, demonstrating the practical impact of integrating formal reasoning into human-AI collaboration for safety-critical systems.
Abstract
Despite significant progress in AI and decision-making technologies in safety-critical fields, challenges remain in verifying the correctness of decision output schemes and verification-result driven design. We propose correctness learning (CL) to enhance human-AI collaboration integrating deductive verification methods and insights from historical high-quality schemes. The typical pattern hidden in historical high-quality schemes, such as change of task priorities in shared resources, provides critical guidance for intelligent agents in learning and decision-making. By utilizing deductive verification methods, we proposed patten-driven correctness learning (PDCL), formally modeling and reasoning the adaptive behaviors-or 'correctness pattern'-of system agents based on historical high-quality schemes, capturing the logical relationships embedded within these schemes. Using this logical information as guidance, we establish a correctness judgment and feedback mechanism to steer the intelligent decision model toward the 'correctness pattern' reflected in historical high-quality schemes. Extensive experiments across multiple working conditions and core parameters validate the framework's components and demonstrate its effectiveness in improving decision-making and resource optimization.
