Table of Contents
Fetching ...

Geospatial Disparities: A Case Study on Real Estate Prices in Paris

Agathe Fernandes Machado, François Hu, Philipp Ratz, Ewen Gallic, Arthur Charpentier

TL;DR

This work tackles geospatial biases in predictive modeling by examining how geographic aggregation affects calibration and fairness. It extends multiclass fairness notions to ordinal regression with a geospatial sensitive attribute, and introduces a graph based neighborhood smoothing framework to reveal spatial structure. A post processing mitigation strategy enforces Demographic Parity without requiring access to the learning process, demonstrated on a Paris real estate dataset where global calibration coexists with regional fairness disparities. The results emphasize the crucial role of aggregation level in policy decisions and provide a practical toolbox for measuring and mitigating geospatial biases in predictive valuation tasks.

Abstract

Driven by an increasing prevalence of trackers, ever more IoT sensors, and the declining cost of computing power, geospatial information has come to play a pivotal role in contemporary predictive models. While enhancing prognostic performance, geospatial data also has the potential to perpetuate many historical socio-economic patterns, raising concerns about a resurgence of biases and exclusionary practices, with their disproportionate impacts on society. Addressing this, our paper emphasizes the crucial need to identify and rectify such biases and calibration errors in predictive models, particularly as algorithms become more intricate and less interpretable. The increasing granularity of geospatial information further introduces ethical concerns, as choosing different geographical scales may exacerbate disparities akin to redlining and exclusionary zoning. To address these issues, we propose a toolkit for identifying and mitigating biases arising from geospatial data. Extending classical fairness definitions, we incorporate an ordinal regression case with spatial attributes, deviating from the binary classification focus. This extension allows us to gauge disparities stemming from data aggregation levels and advocates for a less interfering correction approach. Illustrating our methodology using a Parisian real estate dataset, we showcase practical applications and scrutinize the implications of choosing geographical aggregation levels for fairness and calibration measures.

Geospatial Disparities: A Case Study on Real Estate Prices in Paris

TL;DR

This work tackles geospatial biases in predictive modeling by examining how geographic aggregation affects calibration and fairness. It extends multiclass fairness notions to ordinal regression with a geospatial sensitive attribute, and introduces a graph based neighborhood smoothing framework to reveal spatial structure. A post processing mitigation strategy enforces Demographic Parity without requiring access to the learning process, demonstrated on a Paris real estate dataset where global calibration coexists with regional fairness disparities. The results emphasize the crucial role of aggregation level in policy decisions and provide a practical toolbox for measuring and mitigating geospatial biases in predictive valuation tasks.

Abstract

Driven by an increasing prevalence of trackers, ever more IoT sensors, and the declining cost of computing power, geospatial information has come to play a pivotal role in contemporary predictive models. While enhancing prognostic performance, geospatial data also has the potential to perpetuate many historical socio-economic patterns, raising concerns about a resurgence of biases and exclusionary practices, with their disproportionate impacts on society. Addressing this, our paper emphasizes the crucial need to identify and rectify such biases and calibration errors in predictive models, particularly as algorithms become more intricate and less interpretable. The increasing granularity of geospatial information further introduces ethical concerns, as choosing different geographical scales may exacerbate disparities akin to redlining and exclusionary zoning. To address these issues, we propose a toolkit for identifying and mitigating biases arising from geospatial data. Extending classical fairness definitions, we incorporate an ordinal regression case with spatial attributes, deviating from the binary classification focus. This extension allows us to gauge disparities stemming from data aggregation levels and advocates for a less interfering correction approach. Illustrating our methodology using a Parisian real estate dataset, we showcase practical applications and scrutinize the implications of choosing geographical aggregation levels for fairness and calibration measures.
Paper Structure (26 sections, 12 equations, 11 figures, 1 table)

This paper contains 26 sections, 12 equations, 11 figures, 1 table.

Figures (11)

  • Figure 1: Relative estimation error per $m^2$ in different sub-regions. The values are smoothed across spatial neighbors to emphasize the spatial correlation as outlined in Section \ref{['sec:smoothing']}.
  • Figure 2: A sampled IRIS region within Paris (left pane) and its immediate adjacent neighbors (center pane) and the second level neighbors (right pane). The Seine River is depicted in blue whereas all other regions are depicted in yellow.
  • Figure 3: Smoothed square meter prices, corresponding roughly to the wealth level of the inhabitants. This serves as a motivation to analyze the predictions using quantiles in an ordinal regression framework, as it allows us to stratify the population according to socioeconomic status.
  • Figure 4: Re-Aggregated data, left pane, mean per arrondissement when the raw, un-smoothed data is used to calculate the average price per square meter of real estate. Right pane, results when the neighbor-smoothed estimates are used. In general, there do not seem to be large differences between the methods, but the smoothed estimates allow easier and more robust inference.
  • Figure 5: Calibration on the whole dataset (left), on the observation from the 7th arrondissement only (right) and on randomly drawn values (middle); bins defined using quantiles. Prices are in thousand Euros.
  • ...and 6 more figures

Theorems & Definitions (5)

  • Remark 2.1: The term "fairness"
  • Definition 1: Model calibration in multi-class classification
  • Definition 2: Fairness under Demographic Parity
  • Definition 3: Fairness under Equalized Odds
  • Remark 2.2: Achieving calibration and unfairness