Table of Contents
Fetching ...

Optimal Signal Extraction from Order Flow: A Matched Filter Perspective on Normalization and Market Microstructure

Sungwoo Kang

Abstract

We establish a general matched filter principle for order flow normalization: optimal normalization must match the scaling behaviour of the signal-generating process. For capacity-constrained institutional investors, market capitalization normalization ($S^{MC}$) is the matched filter; for volume-targeting traders (e.g., VWAP/TWAP algorithms), trading value normalization ($S^{TV}$) is optimal. Monte Carlo simulations confirm this principle works bidirectionally, with matched filters achieving up to $1.99\times$ higher signal correlation. Empirical validation using 2.7 million stock-day observations from the Korean market (2020--2024) reveals symmetric normalization dominance across investor types: domestic institutional flows predict next-day returns significantly under $S^{MC}$ ($t = 9.65$), while foreign flows exhibit stronger predictability under $S^{TV}$ ($t = 16.35$) -- with no sign reversal at longer horizons, indicating durable private information rather than temporary price impact. These findings motivate the ``Informed Executor'' hypothesis: sophisticated foreign investors possess genuine private information but employ volume-targeting algorithms for stealth execution -- volume-scaling reflects execution methodology, not absence of information. Information-theoretic validation using KL divergence independently corroborates these results. The matched filter principle generalises to any market where signal scaling varies across trader types, with implications for trading algorithms, factor construction, and market microstructure methodology.

Optimal Signal Extraction from Order Flow: A Matched Filter Perspective on Normalization and Market Microstructure

Abstract

We establish a general matched filter principle for order flow normalization: optimal normalization must match the scaling behaviour of the signal-generating process. For capacity-constrained institutional investors, market capitalization normalization () is the matched filter; for volume-targeting traders (e.g., VWAP/TWAP algorithms), trading value normalization () is optimal. Monte Carlo simulations confirm this principle works bidirectionally, with matched filters achieving up to higher signal correlation. Empirical validation using 2.7 million stock-day observations from the Korean market (2020--2024) reveals symmetric normalization dominance across investor types: domestic institutional flows predict next-day returns significantly under (), while foreign flows exhibit stronger predictability under () -- with no sign reversal at longer horizons, indicating durable private information rather than temporary price impact. These findings motivate the ``Informed Executor'' hypothesis: sophisticated foreign investors possess genuine private information but employ volume-targeting algorithms for stealth execution -- volume-scaling reflects execution methodology, not absence of information. Information-theoretic validation using KL divergence independently corroborates these results. The matched filter principle generalises to any market where signal scaling varies across trader types, with implications for trading algorithms, factor construction, and market microstructure methodology.

Paper Structure

This paper contains 66 sections, 1 theorem, 47 equations, 3 figures, 24 tables.

Key Result

Proposition 3.1

Let $\rho(S,R)$ denote the correlation between a normalized signal $S$ and future returns. The optimal normalization matches the scaling behaviour of the signal-generating process: whenever turnover $\tau_i$ exhibits cross-sectional dispersion and the signal-to-noise ratio is non-negligible (i.e., $\sigma_\alpha^2 / \sigma_\zeta^2$ is not vanishingly small). When the signal is negligible relative

Figures (3)

  • Figure 1: Symmetric Validation of the General Matched Filter Principle. Left panel: Scenario A (capacity-scaled signals) shows MC dominance (1.32$\times$). Right panel: Scenario B (volume-scaled signals) shows TV dominance (1.13$\times$). The matched filter principle applies bidirectionally---optimal normalization matches the signal-generating process regardless of direction.
  • Figure 2: Robustness Checks: Parameter Sensitivity Analysis. Top row: Scenario A (capacity-scaled signals) sensitivity to signal strength (A), noise level (B), and sample size (C). Bottom row: Symmetric validation across both scenarios. Panel (D) shows turnover heterogeneity response---the Scenario A advantage increases sharply (1.05$\times$ to 1.99$\times$), while Scenario B remains stable (1.04$\times$ to 1.15$\times$). Panel (E) summarises both scenarios across all robustness dimensions (see also figure \ref{['fig:symmetric_validation']} for the baseline comparison).
  • Figure 3: Empirical Validation Results: Panel A shows the $R^2$ comparison between normalizations for next-day return prediction (from table \ref{['tab:empirical_return_prediction']}, Panel A). Panel B shows the absolute Pearson correlation between each normalization and daily turnover, computed across the full 2020--2024 sample. TV normalization achieves lower turnover correlation by dividing by volume, while MC normalization retains the signal structure. The horse race regression confirms MC captures a more robust signal component.

Theorems & Definitions (2)

  • Proposition 3.1: General Matched Filter Optimality
  • proof