From Stochastic Answers to Verifiable Reasoning: Interpretable Decision-Making with LLM-Generated Code

Anirudh Jaidev Mahesh; Ben Griffin; Fuat Alican; Joseph Ternasky; Zakari Salifu; Kelvin Amoaba; Yagiz Ihlamur; Aaron Ontoyin Yin; Aikins Laryea; Afriyie Samuel; Yigit Ihlamur

From Stochastic Answers to Verifiable Reasoning: Interpretable Decision-Making with LLM-Generated Code

Anirudh Jaidev Mahesh, Ben Griffin, Fuat Alican, Joseph Ternasky, Zakari Salifu, Kelvin Amoaba, Yagiz Ihlamur, Aaron Ontoyin Yin, Aikins Laryea, Afriyie Samuel, Yigit Ihlamur

Abstract

Large language models (LLMs) are increasingly used for high-stakes decision-making, yet existing approaches struggle to reconcile scalability, interpretability, and reproducibility. Black-box models obscure their reasoning, while recent LLM-based rule systems rely on per-sample evaluation, causing costs to scale with dataset size and introducing stochastic, hallucination-prone outputs. We propose reframing LLMs as code generators rather than per-instance evaluators. A single LLM call generates executable, human-readable decision logic that runs deterministically over structured data, eliminating per-sample LLM queries while enabling reproducible and auditable predictions. We combine code generation with automated statistical validation using precision lift, binomial significance testing, and coverage filtering, and apply cluster-based gap analysis to iteratively refine decision logic without human annotation. We instantiate this framework in venture capital founder screening, a rare-event prediction task with strong interpretability requirements. On VCBench, a benchmark of 4,500 founders with a 9% base success rate, our approach achieves 37.5% precision and an F0.5 score of 25.0%, outperforming GPT-4o (at 30.0% precision and an F0.5 score of 25.7%) while maintaining full interpretability. Each prediction traces to executable rules over human-readable attributes, demonstrating verifiable and interpretable LLM-based decision-making in practice.

From Stochastic Answers to Verifiable Reasoning: Interpretable Decision-Making with LLM-Generated Code

Abstract

Paper Structure (33 sections, 1 equation, 3 figures, 7 tables)

This paper contains 33 sections, 1 equation, 3 figures, 7 tables.

Introduction
Related Work
LLM-Based Feature Engineering for Tabular Data
LLM-Based Feature Engineering for Interpretability
Machine Learning for Venture Capital
Positioning of Our Work
Methodology
Problem Setup
Founder Data Representation
Rule Generation via LLM Code Generation
Prompt Construction
Output Format
Deterministic Rule Evaluation
Statistical Validation
Cluster-Based Gap Analysis
...and 18 more sections

Figures (3)

Figure 1: Overview of our LLM-as-code-generator pipeline. The LLM is called once to generate executable Python rules, which then evaluate deterministically across all founders without further LLM involvement. Statistical validation filters low-quality rules, and cluster-based gap analysis guides iterative refinement.
Figure 2: Rule generation process. The LLM receives structured founder profiles and generates Python lambda expressions that encode binary screening rules. These rules execute deterministically across all 4,500 founders without additional LLM calls.
Figure 3: Rule quality distribution. Each point is a generated rule. Significant rules (green) cluster above the lift=1.0 threshold, with a tradeoff between lift and coverage.

From Stochastic Answers to Verifiable Reasoning: Interpretable Decision-Making with LLM-Generated Code

Abstract

From Stochastic Answers to Verifiable Reasoning: Interpretable Decision-Making with LLM-Generated Code

Authors

Abstract

Table of Contents

Figures (3)