Implicit Intelligence -- Evaluating Agents on What Users Don't Say

Ved Sirdeshmukh; Marc Wetter

Implicit Intelligence -- Evaluating Agents on What Users Don't Say

Ved Sirdeshmukh, Marc Wetter

TL;DR

Implicit Intelligence is presented, an evaluation framework testing whether AI agents can move beyond prompt-following to become genuine goal-fulfillers, paired with Agent-as-a-World (AaW), a harness where interactive worlds are defined in human-readable YAML files and simulated by language models.

Abstract

Real-world requests to AI agents are fundamentally underspecified. Natural human communication relies on shared context and unstated constraints that speakers expect listeners to infer. Current agentic benchmarks test explicit instruction-following but fail to evaluate whether agents can reason about implicit requirements spanning accessibility needs, privacy boundaries, catastrophic risks, and contextual constraints. We present Implicit Intelligence, an evaluation framework testing whether AI agents can move beyond prompt-following to become genuine goal-fulfillers, paired with Agent-as-a-World (AaW), a harness where interactive worlds are defined in human-readable YAML files and simulated by language models. Our scenarios feature apparent simplicity in user requests, hidden complexity in correct solutions, and discoverability of constraints through environmental exploration. Evaluating 16 frontier and open-weight models across 205 scenarios, we find that even the best-performing model achieves only 48.3% scenario pass rate, revealing substantial room for improvement in bridging the gap between literal instruction-following and human-like contextual reasoning.

Implicit Intelligence -- Evaluating Agents on What Users Don't Say

TL;DR

Abstract

Paper Structure (68 sections, 2 equations, 3 figures, 5 tables)

This paper contains 68 sections, 2 equations, 3 figures, 5 tables.

Introduction
Related Work
Agent Benchmarks and Evaluation
Environment Simulation for Agents
Implicit Reasoning and Pragmatic Understanding
Safety and Alignment in Agentic Systems
The Implicit Intelligence Framework
Evaluation Categories
Implicit Reasoning
Catastrophic Risk
Privacy and Security
Accessibility
Agent-as-a-World
Motivation
Specification Format
...and 53 more sections

Figures (3)

Figure 1: Examples of the four Implicit Intelligence categories. See Section \ref{['sec:categories']} for details
Figure 2: System Architecture of Agent-as-a-World - A declarative YAML specification drives a LLM-based World Model, which acts as a universal simulator for evaluating a Primary Agent’s ability to navigate hidden constraints and dynamic environmental states, judged by a deterministic evaluation rubric.
Figure 3: The hybrid construction pipeline to synthetically create scenarios for Implicit Intelligence and using human-in-the-loop to refine

Implicit Intelligence -- Evaluating Agents on What Users Don't Say

TL;DR

Abstract

Implicit Intelligence -- Evaluating Agents on What Users Don't Say

Authors

TL;DR

Abstract

Table of Contents

Figures (3)