Table of Contents
Fetching ...

Perceptions of Edinburgh: Capturing Neighbourhood Characteristics by Clustering Geoparsed Local News

Andreas Grivas, Claire Grover, Richard Tobin, Clare Llewellyn, Eleojo Oluwaseun Abubakar, Chunyu Zheng, Chris Dibben, Alan Marshall, Jamie Pearce, Beatrice Alex

TL;DR

This work combines street-level geoparsing tailored to the locality with clustering of full news articles, enabling a more detailed examination of neighbourhood characteristics, and shows how NLP can be used to unlock further information about neighbourhoods by analysing, geoparsing and clustering news articles.

Abstract

The communities that we live in affect our health in ways that are complex and hard to define. Moreover, our understanding of the place-based processes affecting health and inequalities is limited. This undermines the development of robust policy interventions to improve local health and well-being. News media provides social and community information that may be useful in health studies. Here we propose a methodology for characterising neighbourhoods by using local news articles. More specifically, we show how we can use Natural Language Processing (NLP) to unlock further information about neighbourhoods by analysing, geoparsing and clustering news articles. Our work is novel because we combine street-level geoparsing tailored to the locality with clustering of full news articles, enabling a more detailed examination of neighbourhood characteristics. We evaluate our outputs and show via a confluence of evidence, both from a qualitative and a quantitative perspective, that the themes we extract from news articles are sensible and reflect many characteristics of the real world. This is significant because it allows us to better understand the effects of neighbourhoods on health. Our findings on neighbourhood characterisation using news data will support a new generation of place-based research which examines a wider set of spatial processes and how they affect health, enabling new epidemiological research.

Perceptions of Edinburgh: Capturing Neighbourhood Characteristics by Clustering Geoparsed Local News

TL;DR

This work combines street-level geoparsing tailored to the locality with clustering of full news articles, enabling a more detailed examination of neighbourhood characteristics, and shows how NLP can be used to unlock further information about neighbourhoods by analysing, geoparsing and clustering news articles.

Abstract

The communities that we live in affect our health in ways that are complex and hard to define. Moreover, our understanding of the place-based processes affecting health and inequalities is limited. This undermines the development of robust policy interventions to improve local health and well-being. News media provides social and community information that may be useful in health studies. Here we propose a methodology for characterising neighbourhoods by using local news articles. More specifically, we show how we can use Natural Language Processing (NLP) to unlock further information about neighbourhoods by analysing, geoparsing and clustering news articles. Our work is novel because we combine street-level geoparsing tailored to the locality with clustering of full news articles, enabling a more detailed examination of neighbourhood characteristics. We evaluate our outputs and show via a confluence of evidence, both from a qualitative and a quantitative perspective, that the themes we extract from news articles are sensible and reflect many characteristics of the real world. This is significant because it allows us to better understand the effects of neighbourhoods on health. Our findings on neighbourhood characterisation using news data will support a new generation of place-based research which examines a wider set of spatial processes and how they affect health, enabling new epidemiological research.
Paper Structure (58 sections, 1 equation, 10 figures, 3 tables)

This paper contains 58 sections, 1 equation, 10 figures, 3 tables.

Figures (10)

  • Figure 1: Given a dataset of Edinburgh news articles (N=66,601) we a) identify which locations are mentioned in the articles and b) cluster the articles into themes. We then aggregate the cluster information by location and summarise neighbourhoods as a distribution over themes.
  • Figure 2: The clustering hierarchy contains semantically meaningful groupings of clusters (themes), which we have annotated in green. Each green node corresponds to a cluster id, while red nodes are candidate clusters that were not selected by the clustering algorithm.
  • Figure 3: Word clouds of clusters (top 5, ordered left to right) which are most correlated to SIMD 2020v2 crime rate metric, measured by Spearman's rho, $\rho$.
  • Figure 4: Characterising two neighbourhoods in terms of their distribution over themes. The distribution over themes is broken down into distributions over topics which can be seen in the inner ring of the chart.
  • Figure 5: Hierarchy of clusters
  • ...and 5 more figures