Table of Contents
Fetching ...

The Value of Graph-based Encoding in NBA Salary Prediction

Junhao Su, David Grimsman, Christopher Archibald

TL;DR

This paper shows that building a knowledge graph with on and off court data, embedding that graph in a vector space, and including that vector in the tabular data allows the supervised learning to better understand the landscape of factors that affect salary.

Abstract

Market valuations for professional athletes is a difficult problem, given the amount of variability in performance and location from year to year. In the National Basketball Association (NBA), a straightforward way to address this problem is to build a tabular data set and use supervised machine learning to predict a player's salary based on the player's performance in the previous year. For younger players, whose contracts are mostly built on draft position, this approach works well, however it can fail for veterans or those whose salaries are on the high tail of the distribution. In this paper, we show that building a knowledge graph with on and off court data, embedding that graph in a vector space, and including that vector in the tabular data allows the supervised learning to better understand the landscape of factors that affect salary. We compare several graph embedding algorithms and show that such a process is vital to NBA salary prediction.

The Value of Graph-based Encoding in NBA Salary Prediction

TL;DR

This paper shows that building a knowledge graph with on and off court data, embedding that graph in a vector space, and including that vector in the tabular data allows the supervised learning to better understand the landscape of factors that affect salary.

Abstract

Market valuations for professional athletes is a difficult problem, given the amount of variability in performance and location from year to year. In the National Basketball Association (NBA), a straightforward way to address this problem is to build a tabular data set and use supervised machine learning to predict a player's salary based on the player's performance in the previous year. For younger players, whose contracts are mostly built on draft position, this approach works well, however it can fail for veterans or those whose salaries are on the high tail of the distribution. In this paper, we show that building a knowledge graph with on and off court data, embedding that graph in a vector space, and including that vector in the tabular data allows the supervised learning to better understand the landscape of factors that affect salary. We compare several graph embedding algorithms and show that such a process is vital to NBA salary prediction.
Paper Structure (25 sections, 2 equations, 2 figures, 4 tables)

This paper contains 25 sections, 2 equations, 2 figures, 4 tables.

Figures (2)

  • Figure 1: Schema of the Heterogeneous NBA Knowledge Graph. The graph connects PlayerSeason anchor nodes (center) to diverse entities including Team, Agent, Award, and Injury. Temporal edges (e.g., Won_Previously, Has_Injury_History) are strictly masked by the admissibility function $A(e,s)$ to prevent look-ahead bias.
  • Figure 2: Tri-State Evaluation on Eligible Outliers. (a) vs. Weak Baseline: Static embeddings provide a favorable rescue--misguidance trade-off. (b) vs. Strong Baseline: Dynamic architectures incur a "Generalization Tax," reflecting sensitivity to historical networks.