Table of Contents
Fetching ...

Beyond Eviction Prediction: Leveraging Local Spatiotemporal Public Records to Inform Action

Tasfia Mashiat, Alex DiChristofano, Patrick J. Fowler, Sanmay Das

TL;DR

Problem: evaluating eviction risk scores requires more than predictive accuracy; this work tests whether risk predictions can improve outreach effectiveness. Approach: constructs a novel linked dataset in St. Louis combining eviction filings, properties, and owner data; trains RF, XGBoost, and FNN with eviction-history, neighborhood, and owner features; simulates risk-score–based outreach against neighborhood- and prior-eviction baselines. Contributions: shows that including neighborhood and owner features boosts predictive performance to about $0.89$–$0.90$ AUC and that risk-score–driven outreach discovers more evictions than baselines (e.g., $936$ vs $863$ and $731$) while canvassing fewer properties; analyzes the relative value of neighborhood vs owner information and discusses ethical and scalability considerations. Significance: demonstrates practical feasibility of data-driven eviction prevention with targeted outreach, while acknowledging limitations such as single-city data and the need for careful implementation to avoid inequities.

Abstract

There has been considerable recent interest in scoring properties on the basis of eviction risk. The success of methods for eviction prediction is typically evaluated using different measures of predictive accuracy. However, the underlying goal of such prediction is to direct appropriate assistance to households that may be at greater risk so they remain stably housed. Thus, we must ask the question of how useful such predictions are in targeting outreach efforts - informing action. In this paper, we investigate this question using a novel dataset that matches information on properties, evictions, and owners. We perform an eviction prediction task to produce risk scores and then use these risk scores to plan targeted outreach policies. We show that the risk scores are, in fact, useful, enabling a theoretical team of caseworkers to reach more eviction-prone properties in the same amount of time, compared to outreach policies that are either neighborhood-based or focus on buildings with a recent history of evictions. We also discuss the importance of neighborhood and ownership features in both risk prediction and targeted outreach.

Beyond Eviction Prediction: Leveraging Local Spatiotemporal Public Records to Inform Action

TL;DR

Problem: evaluating eviction risk scores requires more than predictive accuracy; this work tests whether risk predictions can improve outreach effectiveness. Approach: constructs a novel linked dataset in St. Louis combining eviction filings, properties, and owner data; trains RF, XGBoost, and FNN with eviction-history, neighborhood, and owner features; simulates risk-score–based outreach against neighborhood- and prior-eviction baselines. Contributions: shows that including neighborhood and owner features boosts predictive performance to about AUC and that risk-score–driven outreach discovers more evictions than baselines (e.g., vs and ) while canvassing fewer properties; analyzes the relative value of neighborhood vs owner information and discusses ethical and scalability considerations. Significance: demonstrates practical feasibility of data-driven eviction prevention with targeted outreach, while acknowledging limitations such as single-city data and the need for careful implementation to avoid inequities.

Abstract

There has been considerable recent interest in scoring properties on the basis of eviction risk. The success of methods for eviction prediction is typically evaluated using different measures of predictive accuracy. However, the underlying goal of such prediction is to direct appropriate assistance to households that may be at greater risk so they remain stably housed. Thus, we must ask the question of how useful such predictions are in targeting outreach efforts - informing action. In this paper, we investigate this question using a novel dataset that matches information on properties, evictions, and owners. We perform an eviction prediction task to produce risk scores and then use these risk scores to plan targeted outreach policies. We show that the risk scores are, in fact, useful, enabling a theoretical team of caseworkers to reach more eviction-prone properties in the same amount of time, compared to outreach policies that are either neighborhood-based or focus on buildings with a recent history of evictions. We also discuss the importance of neighborhood and ownership features in both risk prediction and targeted outreach.
Paper Structure (32 sections, 2 equations, 6 figures, 5 tables)

This paper contains 32 sections, 2 equations, 6 figures, 5 tables.

Figures (6)

  • Figure 1: Monthly eviction filings in St. Louis from January 2016 - January 2023.
  • Figure 2: Thematic representation of the feature sets in different models. Each property is associated with an eviction record and is situated within a neighborhood, specifically a Census block and block group. Properties are owned by owners who may own multiple properties in different neighborhoods. The number of features incorporated into the binary classifier expands with the radius of the diagram.
  • Figure 3: Training and testing timelines. The three black dots represent the interim months between the beginning and end of a period. There are two training periods and four testing periods. We do not train a model on data from 2020 due to the outsized influence of COVID-19 and the corresponding eviction moratoria on eviction patterns. We denote the model trained using Training Period 1 as the Pre-COVID Model and the model trained using Training Period 2 as the Post-COVID Model. We select three-month test periods composed of November, December, and January to compare the ability of the two models to generalize into the future. Data from all testing periods are withheld from both models during training.
  • Figure 4: ROC Curves for $T_{1}$ and $T_{3}$. (a) The AUC value for $T_{1}$ prediction using the Pre-COVID (XGB) model that includes Eviction, Owner, and Neighborhood attributes has the highest ROC AUC ($0.89$), and outperforms the baseline and model trained on eviction+neighborhood with high levels of statistical significance (p-values of $3.34e^{-72}$ and $2.44e^{-31}$ respectively); (b) Inclusion of Owner, and Neighborhood attributes in the Post-COVID (XGB) model also increases the ROC AUC and the curve is significantly different than the baseline and the model trained on including neighborhood attributes with p-values $1.27e^{-74}$ and $8.00e^{-19}$.
  • Figure 5: Distribution of number of units that fall in a particular risk group [Low, Medium, High]. The X-axis represents the groups based on the number of units, and the Y-axis represents the proportion of properties from each category of unit sizes.
  • ...and 1 more figures