Table of Contents
Fetching ...

A Rank-Based Information Fusion Framework for Comparing Clustered Multivariate Socioeconomic Outcomes

Dhrubajyoti Ghosh

Abstract

We propose a multivariate, distribution-free ranking framework for comparing clustered, correlated outcomes across groups, motivated by the evaluation of state-level policy environments using county-level socioeconomic data. Using pooled U.S. county data from 2019-2023, we study multiple dimensions of economic well-being, including poverty, income inequality, housing cost burden, medical care costs, and per capita income, observed at a finer spatial resolution than the policy itself. Rather than relying on parametric regression models, we employ a rank-based aggregation algorithm derived from the Longitudinal Rank-Sum Test (LRST), which treats clusters as independent units and aggregates information across outcomes using order statistics. This approach provides a robust, interpretable omnibus comparison that accommodates within-cluster dependence and high-dimensional outcome structure without distributional assumptions. Applied to the comparison of states with and without refundable Earned Income Tax Credit (EITC) policies, the method reveals systematic differences in the joint ranking of county-level outcomes, with results remaining stable under repeated random subsampling of counties and varying cluster sizes. While the empirical analysis is descriptive rather than causal, the study highlights the broader utility of rank-based, multi-criteria aggregation methods as computational intelligence tools for analyzing complex, clustered data in policy and social systems.

A Rank-Based Information Fusion Framework for Comparing Clustered Multivariate Socioeconomic Outcomes

Abstract

We propose a multivariate, distribution-free ranking framework for comparing clustered, correlated outcomes across groups, motivated by the evaluation of state-level policy environments using county-level socioeconomic data. Using pooled U.S. county data from 2019-2023, we study multiple dimensions of economic well-being, including poverty, income inequality, housing cost burden, medical care costs, and per capita income, observed at a finer spatial resolution than the policy itself. Rather than relying on parametric regression models, we employ a rank-based aggregation algorithm derived from the Longitudinal Rank-Sum Test (LRST), which treats clusters as independent units and aggregates information across outcomes using order statistics. This approach provides a robust, interpretable omnibus comparison that accommodates within-cluster dependence and high-dimensional outcome structure without distributional assumptions. Applied to the comparison of states with and without refundable Earned Income Tax Credit (EITC) policies, the method reveals systematic differences in the joint ranking of county-level outcomes, with results remaining stable under repeated random subsampling of counties and varying cluster sizes. While the empirical analysis is descriptive rather than causal, the study highlights the broader utility of rank-based, multi-criteria aggregation methods as computational intelligence tools for analyzing complex, clustered data in policy and social systems.

Paper Structure

This paper contains 5 sections, 4 figures, 1 table.

Figures (4)

  • Figure 1: Counties grouped by state Earned Income Tax Credit (EITC) status as of 2023. Blue indicates states with a refundable EITC; gray indicates states without a refundable EITC.
  • Figure 2: Correlation matrix of county-level economic outcomes. Variables are aligned so that higher values indicate worse conditions. The outcomes exhibit moderate but heterogeneous correlations, with a strong association between poverty and (negated) per capita income, and weaker or mixed relationships across other dimensions. This supports the need for multivariate aggregation.
  • Figure 3: County-level poverty rate across the continental United States. Lighter colors indicate higher poverty rates. Source: PolicyMap (2019--2023 pooled).
  • Figure 4: Standardized mean differences in county-level economic outcomes between states with and without refundable EITC policies. Differences are expressed in units of the median absolute deviation.