Table of Contents
Fetching ...

Objective Mispricing Detection for Shortlisting Undervalued Football Players via Market Dynamics and News Signals

Chinenye Omejieke, Shuyao Chen, Xia Cui

Abstract

We present a practical, reproducible framework for identifying undervalued football players grounded in objective mispricing. Instead of relying on subjective expert labels, we estimate an expected market value from structured data (historical market dynamics, biographical and contract features, transfer history) and compare it to the observed valuation to define mispricing. We then assess whether news-derived Natural Language Processing (NLP) features (i.e., sentiment statistics and semantic embeddings from football articles) complement market signals for shortlisting undervalued players. Using a chronological (leakage-aware) evaluation, gradient-boosted regression explains a large share of the variance in log-transformed market value. For undervaluation shortlisting, ROC-AUC-based ablations show that market dynamics are the primary signal, while NLP features provide consistent, secondary gains that improve robustness and interpretability. SHAP analyses suggest the dominance of market trends and age, with news-derived volatility cues amplifying signals in high-uncertainty regimes. The proposed pipeline is designed for decision support in scouting workflows, emphasizing ranking/shortlisting over hard classification thresholds, and includes a concise reproducibility and ethics statement.

Objective Mispricing Detection for Shortlisting Undervalued Football Players via Market Dynamics and News Signals

Abstract

We present a practical, reproducible framework for identifying undervalued football players grounded in objective mispricing. Instead of relying on subjective expert labels, we estimate an expected market value from structured data (historical market dynamics, biographical and contract features, transfer history) and compare it to the observed valuation to define mispricing. We then assess whether news-derived Natural Language Processing (NLP) features (i.e., sentiment statistics and semantic embeddings from football articles) complement market signals for shortlisting undervalued players. Using a chronological (leakage-aware) evaluation, gradient-boosted regression explains a large share of the variance in log-transformed market value. For undervaluation shortlisting, ROC-AUC-based ablations show that market dynamics are the primary signal, while NLP features provide consistent, secondary gains that improve robustness and interpretability. SHAP analyses suggest the dominance of market trends and age, with news-derived volatility cues amplifying signals in high-uncertainty regimes. The proposed pipeline is designed for decision support in scouting workflows, emphasizing ranking/shortlisting over hard classification thresholds, and includes a concise reproducibility and ethics statement.
Paper Structure (22 sections, 6 equations, 3 figures, 3 tables)

This paper contains 22 sections, 6 equations, 3 figures, 3 tables.

Figures (3)

  • Figure 1: Observed versus expected market value (log scale). Players above the diagonal ($y > x$) exhibit positive mispricing and are shortlisted as undervalued. The spread of points illustrates the model's ability to identify valuation discrepancies.
  • Figure 2: SHAP Summary Plot for the XGBoost Regression Model, showing each feature’s contribution to predicted log‑market value. Features are ranked by their global impact, with red indicating high feature values and blue indicating low values.
  • Figure 3: Prototype deployment architecture for the multimodal undervaluation shortlisting system.