BRAID: Bounded Reasoning for Autonomous Inference and Decisions
Armağan Amcalar, Eyup Cinar
TL;DR
BRAID introduces a bounded, diagrammatic reasoning framework that replaces unbounded natural-language traces with Mermaid diagrams to improve token efficiency and reliability in autonomous inference. Across GSM-Hard, SCALE MultiChallenge, and AdvancedIF benchmarks, BRAID enables smaller models to match or exceed larger-model performance while substantially lowering cost, as quantified by the Performance-per-Dollar (PPD) metric. A two-stage generation/solve pipeline and a caching strategy enable dramatic efficiency gains (up to tens of times the baseline) by decoupling reasoning topology from execution. The work demonstrates both accuracy improvements and economic advantages, highlighting BRAID as a scalable methodology for deploying cost-effective, reasoning-enabled autonomous agents. It also outlines concrete future directions for specialized graph generators, dynamic planning, and multimodal graph ingestion to extend BRAID’s applicability.
Abstract
Large Language Models (LLMs) exhibit nonlinear relationships between performance, cost, and token usage. This paper presents a quantitative study on structured prompting using BRAID (Bounded Reasoning for Au tonomous Inference and Decisions) across multiple GPT model tiers, eval uated on the AdvancedIF, GSM-Hard, and the SCALE MultiChallenge benchmark datasets. BRAID introduces a bounded reasoning framework using Mermaid-based instruction graphs that enable models to reason struc turally rather than through unbounded natural-language token expansion. We show that structured machine-readable prompts substantially increase reasoning accuracy and cost efficiency for agents in production systems. The findings establish BRAID as an effective and scalable technique for optimizing inference efficiency in autonomous agent systems. All datasets and detailed result logs are available at https://benchmark.openserv.ai.
