Table of Contents
Fetching ...

A Decade of Public Procurement in Spain: A Longitudinal Open Dataset from the BOE (2014-2024)

Manuel Munoz Pla

TL;DR

The data extraction and normalization pipeline is described, descriptive statistical analyses of temporal and sectoral trends are provided, and potential applications in transparency research, public policy evaluation, and computational social science are discussed.

Abstract

This paper presents a longitudinal open dataset of Spanish public procurement extracted from the Official State Gazette (BOE) covering the period 2014-2024. The dataset integrates structured information on contracts, contracting authorities, suppliers, amounts, and procedures, enabling large-scale quantitative analysis of public procurement dynamics in Spain. We describe the data extraction and normalization pipeline, provide descriptive statistical analyses of temporal and sectoral trends, and discuss potential applications in transparency research, public policy evaluation, and computational social science. The dataset is released to facilitate reproducible research on public procurement and government contracting.

A Decade of Public Procurement in Spain: A Longitudinal Open Dataset from the BOE (2014-2024)

TL;DR

The data extraction and normalization pipeline is described, descriptive statistical analyses of temporal and sectoral trends are provided, and potential applications in transparency research, public policy evaluation, and computational social science are discussed.

Abstract

This paper presents a longitudinal open dataset of Spanish public procurement extracted from the Official State Gazette (BOE) covering the period 2014-2024. The dataset integrates structured information on contracts, contracting authorities, suppliers, amounts, and procedures, enabling large-scale quantitative analysis of public procurement dynamics in Spain. We describe the data extraction and normalization pipeline, provide descriptive statistical analyses of temporal and sectoral trends, and discuss potential applications in transparency research, public policy evaluation, and computational social science. The dataset is released to facilitate reproducible research on public procurement and government contracting.
Paper Structure (23 sections, 5 figures, 7 tables)

This paper contains 23 sections, 5 figures, 7 tables.

Figures (5)

  • Figure 1: Distribution of awarded contract values in log10 scale. Most contracts concentrate between $10^5$ and $10^6$ euros, with long tails at both extremes.
  • Figure 2: Logarithmic violin plot of estimated and awarded values. Awarded amounts track estimated values closely around the median, with tighter concentration and shorter tails, consistent with competitive adjustment during the tender process.
  • Figure 3: Hexbin scatter of predicted vs. real awarded values in log-log scale. Concentration around the diagonal denotes acceptable accuracy for standard contracts; dispersion increases at both extremes of the distribution.
  • Figure 4: K-Means clustering of contractors by number of contracts (log10) and total awarded value (log10). Colours denote the three clusters: high-value operators (red), standard operators (green), and microoperators (blue).
  • Figure 5: Boxplot of awarded values (log10) for Works and Services contracts. Works contracts present a substantially higher median and wider interquartile range.