Table of Contents
Fetching ...

Correctness Learning: Deductive Verification Guided Learning for Human-AI Collaboration

Zhao Jin, Lu Jin, Yizhe Luo, Shuo Feng, Yucheng Shi, Kai Zheng, Xinde Yu, Mingliang Xu

TL;DR

This work addresses the challenge of ensuring correctness in AI-assisted decision systems by marrying deductive verification with learning. It introduces pattern-driven correctness learning (PDCL), a two-stage framework where hierarchical reinforcement learning generates output schemes and formal deductive verification via separation logic and Coq appraises their conformity to historical correctness patterns, guiding the learner through pattern-matching rewards. The approach is instantiated for a job scheduling domain using a two-tier resource heap model and a modeling language ML_JSS, with Coq-based formal development verifying both historical and generated schemes. Empirical results across multiple scenarios show that PDCL enhances decision quality and resource utilization for several RL baselines, demonstrating the practical impact of integrating formal reasoning into human-AI collaboration for safety-critical systems.

Abstract

Despite significant progress in AI and decision-making technologies in safety-critical fields, challenges remain in verifying the correctness of decision output schemes and verification-result driven design. We propose correctness learning (CL) to enhance human-AI collaboration integrating deductive verification methods and insights from historical high-quality schemes. The typical pattern hidden in historical high-quality schemes, such as change of task priorities in shared resources, provides critical guidance for intelligent agents in learning and decision-making. By utilizing deductive verification methods, we proposed patten-driven correctness learning (PDCL), formally modeling and reasoning the adaptive behaviors-or 'correctness pattern'-of system agents based on historical high-quality schemes, capturing the logical relationships embedded within these schemes. Using this logical information as guidance, we establish a correctness judgment and feedback mechanism to steer the intelligent decision model toward the 'correctness pattern' reflected in historical high-quality schemes. Extensive experiments across multiple working conditions and core parameters validate the framework's components and demonstrate its effectiveness in improving decision-making and resource optimization.

Correctness Learning: Deductive Verification Guided Learning for Human-AI Collaboration

TL;DR

This work addresses the challenge of ensuring correctness in AI-assisted decision systems by marrying deductive verification with learning. It introduces pattern-driven correctness learning (PDCL), a two-stage framework where hierarchical reinforcement learning generates output schemes and formal deductive verification via separation logic and Coq appraises their conformity to historical correctness patterns, guiding the learner through pattern-matching rewards. The approach is instantiated for a job scheduling domain using a two-tier resource heap model and a modeling language ML_JSS, with Coq-based formal development verifying both historical and generated schemes. Empirical results across multiple scenarios show that PDCL enhances decision quality and resource utilization for several RL baselines, demonstrating the practical impact of integrating formal reasoning into human-AI collaboration for safety-critical systems.

Abstract

Despite significant progress in AI and decision-making technologies in safety-critical fields, challenges remain in verifying the correctness of decision output schemes and verification-result driven design. We propose correctness learning (CL) to enhance human-AI collaboration integrating deductive verification methods and insights from historical high-quality schemes. The typical pattern hidden in historical high-quality schemes, such as change of task priorities in shared resources, provides critical guidance for intelligent agents in learning and decision-making. By utilizing deductive verification methods, we proposed patten-driven correctness learning (PDCL), formally modeling and reasoning the adaptive behaviors-or 'correctness pattern'-of system agents based on historical high-quality schemes, capturing the logical relationships embedded within these schemes. Using this logical information as guidance, we establish a correctness judgment and feedback mechanism to steer the intelligent decision model toward the 'correctness pattern' reflected in historical high-quality schemes. Extensive experiments across multiple working conditions and core parameters validate the framework's components and demonstrate its effectiveness in improving decision-making and resource optimization.

Paper Structure

This paper contains 19 sections, 6 equations, 5 figures, 3 tables, 1 algorithm.

Figures (5)

  • Figure 1: Overview of the proposed correctness learning framework PDCL. In the first stage, a hierarchical reinforcement learning (HRL) model is constructed to generate output schemes. In the second stage, typical correctness patterns (task priorities) of historical high-quality schemes and output schemes are formally proved in Coq and feedback is given to the model based on the degree of pattern matching.
  • Figure 2: Schematic diagram of the two-tier resource heap model of the job scheduling system.
  • Figure 3: Modeling program of the historical high-quality schemes (simplified)
  • Figure 4: Results of different algorithms and joint pattern guidance on 10 tasks.
  • Figure 5: Results of different algorithms and joint pattern guidance on 12 tasks.