Table of Contents
Fetching ...

Inferring comparative advantage via entropy maximization

Matteo Bruno, Dario Mazzilli, Aurelio Patelli, Tiziano Squartini, Fabio Saracco

TL;DR

Balassa's Revealed Comparative Advantage is reinterpreted within an entropy-maximization framework to address biases from maximum-likelihood benchmarks and to introduce statistically valid hypothesis tests. The authors derive discrete and continuous Bipartite Weighted Configuration Models (BiWCM) as unbiased null-models for weighted country-product networks and supplement them with dressed variants (MERCA) to perform significance testing. They apply false discovery rate control to identify statistically significant key products, comparing results with traditional RCA-based validation on 2015 COMTRADE data, and find that country diversification is robust while product complexities are more sensitive to the validation method. The work provides methodological advances for economic complexity studies, delivering an open-source Python package to implement the entropy-based filtering and enabling more rigorous, statistically grounded analyses of comparative advantage.

Abstract

We revise the procedure proposed by Balassa to infer comparative advantage, which is a standard tool, in Economics, to analyze specialization (of countries, regions, etc.). Balassa's approach compares the export of a product for each country with what would be expected from a benchmark based on the total volumes of countries and products flows. Based on results in the literature, we show that the implementation of Balassa's idea generates a bias: the prescription of the maximum likelihood used to calculate the parameters of the benchmark model conflicts with the model's definition. Moreover, Balassa's approach does not implement any statistical validation. Hence, we propose an alternative procedure to overcome such a limitation, based upon the framework of entropy maximisation and implementing a proper test of hypothesis: the `key products' of a country are, now, the ones whose production is significantly larger than expected, under a null-model constraining the same amount of information employed by Balassa's approach. What we found is that countries diversification is always observed, regardless of the strictness of the validation procedure. Besides, the ranking of countries' fitness is only partially affected by the details of the validation scheme employed for the analysis while large differences are found to affect the rankings of products Complexities. The routine for implementing the entropy-based filtering procedures employed here is freely available through the official Python Package Index PyPI.

Inferring comparative advantage via entropy maximization

TL;DR

Balassa's Revealed Comparative Advantage is reinterpreted within an entropy-maximization framework to address biases from maximum-likelihood benchmarks and to introduce statistically valid hypothesis tests. The authors derive discrete and continuous Bipartite Weighted Configuration Models (BiWCM) as unbiased null-models for weighted country-product networks and supplement them with dressed variants (MERCA) to perform significance testing. They apply false discovery rate control to identify statistically significant key products, comparing results with traditional RCA-based validation on 2015 COMTRADE data, and find that country diversification is robust while product complexities are more sensitive to the validation method. The work provides methodological advances for economic complexity studies, delivering an open-source Python package to implement the entropy-based filtering and enabling more rigorous, statistically grounded analyses of comparative advantage.

Abstract

We revise the procedure proposed by Balassa to infer comparative advantage, which is a standard tool, in Economics, to analyze specialization (of countries, regions, etc.). Balassa's approach compares the export of a product for each country with what would be expected from a benchmark based on the total volumes of countries and products flows. Based on results in the literature, we show that the implementation of Balassa's idea generates a bias: the prescription of the maximum likelihood used to calculate the parameters of the benchmark model conflicts with the model's definition. Moreover, Balassa's approach does not implement any statistical validation. Hence, we propose an alternative procedure to overcome such a limitation, based upon the framework of entropy maximisation and implementing a proper test of hypothesis: the `key products' of a country are, now, the ones whose production is significantly larger than expected, under a null-model constraining the same amount of information employed by Balassa's approach. What we found is that countries diversification is always observed, regardless of the strictness of the validation procedure. Besides, the ranking of countries' fitness is only partially affected by the details of the validation scheme employed for the analysis while large differences are found to affect the rankings of products Complexities. The routine for implementing the entropy-based filtering procedures employed here is freely available through the official Python Package Index PyPI.
Paper Structure (21 sections, 43 equations, 2 figures, 1 table)

This paper contains 21 sections, 43 equations, 2 figures, 1 table.

Figures (2)

  • Figure 1: Top panels: Matrices representing the adjacency matrix of the Bipartite Network of the export of International Trade for the year 2015 of our dataset. Panel a) shows the $\mathbf{M}^{\mu-\textnormal{MERCA}_c}=\mathbf{M}^{\mu-\textnormal{RCA}}$ matrix: blue dots are validated exclusively by $\mu-\textnormal{MERCA}_c$, red ones are validated also by $\alpha-\textnormal{MERCA}_c$. Panel b) shows the $\mathbf{M}^{\mu-\textnormal{BiWCM}_c}$ matrix and the color codes the statistical validation: blue dots are validated exclusively by $\mu-$BiWCM$_c$, red ones are validated also by $\alpha-$BiWCM$_c$. Each matrix is re-ordered following the Fitness and Complexity ranking evaluated on the $\mu-$validated matrix. Bottom panels: bars representing the basket of products of a few samples from the matrices above.
  • Figure 2: Real Vs expected trade flows. a) Expected value of trade flows given by Eq.(\ref{['eq:likelihood_biwcm']}). b) Real trade value. c) $1-\textnormal{p-value}_{\textnormal{BiWCM}_c}(w_{i\alpha})$ given by Eq.(\ref{['eq:pval_biwcm']}) as a measure of surprise of exceeding value of real data. d) Logarithm of the ratio $\frac{1+w^*}{1+\langle w \rangle_{\textnormal{BiWCM}_c}}$ where we added 1 to every entry to avoid infinities and the logarithm of zero issues. All matrices have the same ordering, given by the ranking of Fitness and Complexity calculated with the matrix in panel c).