Table of Contents
Fetching ...

Inferring Latent Market Forces: Evaluating LLM Detection of Gamma Exposure Patterns via Obfuscation Testing

Christopher Regan, Ying Xie

TL;DR

The paper introduces obfuscation testing to assess whether large language models can reason about structural market mechanisms, specifically dealer hedging constraints tied to gamma exposure, rather than rely on temporal memorization. It implements a WHO→WHOM→WHAT causal framework across three dealer-hedging patterns using fully obfuscated SPY options data from 2024, achieving a 71.5% detection rate and a 91.2% forward-materialization accuracy. Detection reaches 100% with regime labels, illustrating the sensitivity to contextual prompts, while robust statistical validation (including Granger causality) supports genuine mechanistic understanding. The work demonstrates emergent structural reasoning capabilities in transformers, offers a rigorous validation methodology, and highlights implications for risk management, surveillance, and AI governance in financial markets.

Abstract

We introduce obfuscation testing, a novel methodology for validating whether large language models detect structural market patterns through causal reasoning rather than temporal association. Testing three dealer hedging constraint patterns (gamma positioning, stock pinning, 0DTE hedging) on 242 trading days (95.6% coverage) of S&P 500 options data, we find LLMs achieve 71.5% detection rate using unbiased prompts that provide only raw gamma exposure values without regime labels or temporal context. The WHO-WHOM-WHAT causal framework forces models to identify the economic actors (dealers), affected parties (directional traders), and structural mechanisms (forced hedging) underlying observed market dynamics. Critically, detection accuracy (91.2%) remains stable even as economic profitability varies quarterly, demonstrating that models identify structural constraints rather than profitable patterns. When prompted with regime labels, detection increases to 100%, but the 71.5% unbiased rate validates genuine pattern recognition. Our findings suggest LLMs possess emergent capabilities for detecting complex financial mechanisms through pure structural reasoning, with implications for systematic strategy development, risk management, and our understanding of how transformer architectures process financial market dynamics.

Inferring Latent Market Forces: Evaluating LLM Detection of Gamma Exposure Patterns via Obfuscation Testing

TL;DR

The paper introduces obfuscation testing to assess whether large language models can reason about structural market mechanisms, specifically dealer hedging constraints tied to gamma exposure, rather than rely on temporal memorization. It implements a WHO→WHOM→WHAT causal framework across three dealer-hedging patterns using fully obfuscated SPY options data from 2024, achieving a 71.5% detection rate and a 91.2% forward-materialization accuracy. Detection reaches 100% with regime labels, illustrating the sensitivity to contextual prompts, while robust statistical validation (including Granger causality) supports genuine mechanistic understanding. The work demonstrates emergent structural reasoning capabilities in transformers, offers a rigorous validation methodology, and highlights implications for risk management, surveillance, and AI governance in financial markets.

Abstract

We introduce obfuscation testing, a novel methodology for validating whether large language models detect structural market patterns through causal reasoning rather than temporal association. Testing three dealer hedging constraint patterns (gamma positioning, stock pinning, 0DTE hedging) on 242 trading days (95.6% coverage) of S&P 500 options data, we find LLMs achieve 71.5% detection rate using unbiased prompts that provide only raw gamma exposure values without regime labels or temporal context. The WHO-WHOM-WHAT causal framework forces models to identify the economic actors (dealers), affected parties (directional traders), and structural mechanisms (forced hedging) underlying observed market dynamics. Critically, detection accuracy (91.2%) remains stable even as economic profitability varies quarterly, demonstrating that models identify structural constraints rather than profitable patterns. When prompted with regime labels, detection increases to 100%, but the 71.5% unbiased rate validates genuine pattern recognition. Our findings suggest LLMs possess emergent capabilities for detecting complex financial mechanisms through pure structural reasoning, with implications for systematic strategy development, risk management, and our understanding of how transformer architectures process financial market dynamics.

Paper Structure

This paper contains 39 sections, 4 equations, 7 figures, 6 tables.

Figures (7)

  • Figure 1: Temporal obfuscation methodology. Left: Original data with dates, ticker symbols, and market context. Right: Obfuscated data preserving only structural quantitative relationships while removing all temporal and contextual information.
  • Figure 2: Representative gamma exposure profile showing negative net GEX of -$32.5B. Red bars indicate dealer short gamma positions requiring pro-cyclical hedging (selling rallies, buying dips) that amplifies price movements.
  • Figure 3: Five-component validation pipeline architecture transforming raw options data through GEX calculation, temporal obfuscation, LLM analysis, outcome verification, and statistical validation.
  • Figure 4: Pattern detection performance metrics using unbiased prompts. All three patterns exceed the 60% mechanical threshold with detection rates averaging 71.5% and prediction accuracy averaging 90.8% across 242 trading days.
  • Figure 5: Pattern detection persists above threshold despite declining profitability. Detection rates remain stable (100% to 84%) while net alpha declines from +21 bps to -1 bp across quarters, validating structural pattern detection independent of economic outcomes.
  • ...and 2 more figures