Table of Contents
Fetching ...

Insider Purchase Signals in Microcap Equities: Gradient Boosting Detection of Abnormal Returns

Hangyi Zhao

TL;DR

This study tests whether insider purchases disclosed via SEC Form 4 predict abnormal returns in microcap stocks ($30M$–$500M$). Using an XGBoost classifier trained on insider characteristics, transaction history, and market conditions, the authors achieve an out-of-sample AUC of $0.70$ on 2024 data, with an optimized threshold of $0.20$ yielding precision $0.38$ and recall $0.69$. The analysis identifies distance from the $52$-week high as the dominant predictor (≈36% of predictive power) and uncovers a momentum-like pattern: disclosures after price strength (>10% since trade) yield higher mean CARs (≈$6.3 ext{%}$) and higher outperformance probability (≈$36.7 ext{%}$), challenging mean-reversion intuitions in illiquid microcaps. These results imply slower information incorporation in microcaps and suggest that price momentum can validate insider signals when liquidity is limited. The work demonstrates that nonlinear ML can extract actionable regulatory signals, with practical implications for adjusting for market conditions and liquidity in insider-trade-based strategies.

Abstract

This paper examines whether SEC Form 4 insider purchase filings predict abnormal returns in U.S. microcap stocks. The analysis covers 17,237 open-market purchases across 1,343 issuers from 2018 through 2024, restricted to market capitalizations between \$30M and \$500M. A gradient boosting classifier trained on insider identity, transaction history, and market conditions at disclosure achieves AUC of 0.70 on out-of-sample 2024 data. At an optimized threshold of 0.20, precision is 0.38 and recall is 0.69. The distance from the 52-week high dominates feature importance, accounting for 36% of predictive signal. A momentum pattern emerges in the data: transactions disclosed after price appreciation exceeding 10% yield the highest mean cumulative abnormal return (6.3%) and the highest probability of outperformance (36.7%). This contrasts with the simple mean-reversion intuition often applied to post-run-up entries. The result is robust to winsorization and holds across subsamples. These patterns are consistent with slower information incorporation in illiquid markets, where trend confirmation may filter for higher-conviction insider signals.

Insider Purchase Signals in Microcap Equities: Gradient Boosting Detection of Abnormal Returns

TL;DR

This study tests whether insider purchases disclosed via SEC Form 4 predict abnormal returns in microcap stocks (). Using an XGBoost classifier trained on insider characteristics, transaction history, and market conditions, the authors achieve an out-of-sample AUC of on 2024 data, with an optimized threshold of yielding precision and recall . The analysis identifies distance from the -week high as the dominant predictor (≈36% of predictive power) and uncovers a momentum-like pattern: disclosures after price strength (>10% since trade) yield higher mean CARs (≈) and higher outperformance probability (≈), challenging mean-reversion intuitions in illiquid microcaps. These results imply slower information incorporation in microcaps and suggest that price momentum can validate insider signals when liquidity is limited. The work demonstrates that nonlinear ML can extract actionable regulatory signals, with practical implications for adjusting for market conditions and liquidity in insider-trade-based strategies.

Abstract

This paper examines whether SEC Form 4 insider purchase filings predict abnormal returns in U.S. microcap stocks. The analysis covers 17,237 open-market purchases across 1,343 issuers from 2018 through 2024, restricted to market capitalizations between \500M. A gradient boosting classifier trained on insider identity, transaction history, and market conditions at disclosure achieves AUC of 0.70 on out-of-sample 2024 data. At an optimized threshold of 0.20, precision is 0.38 and recall is 0.69. The distance from the 52-week high dominates feature importance, accounting for 36% of predictive signal. A momentum pattern emerges in the data: transactions disclosed after price appreciation exceeding 10% yield the highest mean cumulative abnormal return (6.3%) and the highest probability of outperformance (36.7%). This contrasts with the simple mean-reversion intuition often applied to post-run-up entries. The result is robust to winsorization and holds across subsamples. These patterns are consistent with slower information incorporation in illiquid markets, where trend confirmation may filter for higher-conviction insider signals.
Paper Structure (17 sections, 2 equations, 4 figures, 4 tables)

This paper contains 17 sections, 2 equations, 4 figures, 4 tables.

Figures (4)

  • Figure 1: ROC curves for classification models on 2024 test set. XGBoost (AUC = 0.70) and Random Forest (AUC = 0.69) exhibit similar overall predictive power, slightly outperforming Logistic Regression (AUC = 0.67).
  • Figure 2: Feature importance from XGBoost model (measured by average gain). The distance from 52-week high accounts for 36% of the total predictive contribution, substantially exceeding all other features.
  • Figure 3: Model calibration. Left: calibration curve showing actual vs. predicted positive rates. Right: distribution of predicted probabilities.
  • Figure 4: Confusion matrix at optimized threshold (0.20).