Table of Contents
Fetching ...

AI for Scaling Legal Reform: Mapping and Redacting Racial Covenants in Santa Clara County

Faiz Surani, Mirac Suzgun, Vyoma Raman, Christopher D. Manning, Peter Henderson, Daniel E. Ho

TL;DR

This work tackles the scale and complexity of removing racially restrictive covenants from historical property records by coupling an open, finetuned large language model with an OCR pipeline and a responsible human-in-the-loop workflow. The authors demonstrate a high-performance detection system that vastly reduces manual review—achieving perfect precision and near-perfect recall on their evaluation—while enabling full processing of 5.2 million deed pages at a fraction of the cost of manual review or closed-model alternatives. Beyond detection, they integrate geolocation and maintain a historical registry to preserve context, and they reveal insights into the geographic and developer-driven dynamics of covenants in Santa Clara County over the 1907–1974 window, including a finding that by 1950 about one in four properties were affected. The work provides a practical blueprint for scalable, open-model AI deployment in the public sector to advance housing reform, preserve historical memory, and support other jurisdictions undertaking similar redaction and mapping efforts.

Abstract

Legal reform can be challenging in light of the volume, complexity, and interdependence of laws, codes, and records. One salient example of this challenge is the effort to restrict and remove racially restrictive covenants, clauses in property deeds that historically barred individuals of specific races from purchasing homes. Despite the Supreme Court holding such racial covenants unenforceable in 1948, they persist in property records across the United States. Many jurisdictions have moved to identify and strike these provisions, including California, which mandated in 2021 that all counties implement such a process. Yet the scale can be overwhelming, with Santa Clara County (SCC) alone having over 24 million property deed documents, making purely manual review infeasible. We present a novel approach to addressing this pressing issue, developed through a partnership with the SCC Clerk-Recorder's Office. First, we leverage an open large language model, finetuned to detect racial covenants with high precision and recall. We estimate that this system reduces manual efforts by 86,500 person hours and costs less than 2% of the cost for a comparable off-the-shelf closed model. Second, we illustrate the County's integration of this model into responsible operational practice, including legal review and the creation of a historical registry, and release our model to assist the hundreds of jurisdictions engaged in similar efforts. Finally, our results reveal distinct periods of utilization of racial covenants, sharp geographic clustering, and the disproportionate role of a small number of developers in maintaining housing discrimination. We estimate that by 1950, one in four properties across the County were subject to racial covenants.

AI for Scaling Legal Reform: Mapping and Redacting Racial Covenants in Santa Clara County

TL;DR

This work tackles the scale and complexity of removing racially restrictive covenants from historical property records by coupling an open, finetuned large language model with an OCR pipeline and a responsible human-in-the-loop workflow. The authors demonstrate a high-performance detection system that vastly reduces manual review—achieving perfect precision and near-perfect recall on their evaluation—while enabling full processing of 5.2 million deed pages at a fraction of the cost of manual review or closed-model alternatives. Beyond detection, they integrate geolocation and maintain a historical registry to preserve context, and they reveal insights into the geographic and developer-driven dynamics of covenants in Santa Clara County over the 1907–1974 window, including a finding that by 1950 about one in four properties were affected. The work provides a practical blueprint for scalable, open-model AI deployment in the public sector to advance housing reform, preserve historical memory, and support other jurisdictions undertaking similar redaction and mapping efforts.

Abstract

Legal reform can be challenging in light of the volume, complexity, and interdependence of laws, codes, and records. One salient example of this challenge is the effort to restrict and remove racially restrictive covenants, clauses in property deeds that historically barred individuals of specific races from purchasing homes. Despite the Supreme Court holding such racial covenants unenforceable in 1948, they persist in property records across the United States. Many jurisdictions have moved to identify and strike these provisions, including California, which mandated in 2021 that all counties implement such a process. Yet the scale can be overwhelming, with Santa Clara County (SCC) alone having over 24 million property deed documents, making purely manual review infeasible. We present a novel approach to addressing this pressing issue, developed through a partnership with the SCC Clerk-Recorder's Office. First, we leverage an open large language model, finetuned to detect racial covenants with high precision and recall. We estimate that this system reduces manual efforts by 86,500 person hours and costs less than 2% of the cost for a comparable off-the-shelf closed model. Second, we illustrate the County's integration of this model into responsible operational practice, including legal review and the creation of a historical registry, and release our model to assist the hundreds of jurisdictions engaged in similar efforts. Finally, our results reveal distinct periods of utilization of racial covenants, sharp geographic clustering, and the disproportionate role of a small number of developers in maintaining housing discrimination. We estimate that by 1950, one in four properties across the County were subject to racial covenants.

Paper Structure

This paper contains 39 sections, 1 equation, 18 figures, 4 tables.

Figures (18)

  • Figure 1: Although racially restrictive covenants are no longer legally enforceable and are considered illegal under the Fair Housing Act today, they still exist in thousands, possibly even millions, of historical property records in California. One such example, found in a 1940 real property deed from Santa Clara County's archives, contains the following discriminatory language: "No persons not of the Caucasian Race shall be allowed to occupy, except as servants of residents, said real property or any part thereof." The deed further specifies that "[t]hese covenants are to run with the land and shall be binding on all parties," thereby affecting not only the tenants at the time but also the potential future owners of the land.
  • Figure 2: Brief overview of legal developments that impacted California's housing market in the 20th century. The Rumford Act was overturned by Proposition 14, which was in turn found unconstitutional by the California Supreme Court in Mulkey v. Reitman, 64 Cal. 2d 529 (1966).
  • Figure 3: Diagram of our pipeline for detecting racial covenants. The process begins by converting an image of a property deed into text using an OCR tool (docTR). The transcribed text is then analyzed for racially discriminatory language. If such unlawful language is found, the system highlights the content and extracts the property address. Both the highlighted language and the corresponding address are sent then to Santa Clara County for legal review and final confirmation.
  • Figure 4: Example of location information in a 1916 property deed. Crucially, we can extract the name of the map which depicts the property, as well as the book and page number on which the map appears. Other useful data, such as the names of the parties and the exact date on which the deed was recorded, can also be extracted.
  • Figure 5: False positive and false negative predictions from each detection approach considered in our study. These examples show typical OCR errors present in our pipeline.
  • ...and 13 more figures