Table of Contents
Fetching ...

Machine learning for risk assessment in gender-based crime

Ángel González-Prieto, Antonio Brú, Juan Carlos Nuño, José Luis González-Álvarez

TL;DR

This work proposes to apply Machine Learning (ML) techniques to create models that accurately predict the recidivism risk of a gender-violence offender, and proposes a hybrid model that combines the statistical prediction methods with the ML method, permitting authorities to implement a smooth transition from the preexisting model to the ML-based model.

Abstract

Gender-based crime is one of the most concerning scourges of contemporary society. Governments worldwide have invested lots of economic and human resources to radically eliminate this threat. Despite these efforts, providing accurate predictions of the risk that a victim of gender violence has of being attacked again is still a very hard open problem. The development of new methods for issuing accurate, fair and quick predictions would allow police forces to select the most appropriate measures to prevent recidivism. In this work, we propose to apply Machine Learning (ML) techniques to create models that accurately predict the recidivism risk of a gender-violence offender. The relevance of the contribution of this work is threefold: (i) the proposed ML method outperforms the preexisting risk assessment algorithm based on classical statistical techniques, (ii) the study has been conducted through an official specific-purpose database with more than 40,000 reports of gender violence, and (iii) two new quality measures are proposed for assessing the effective police protection that a model supplies and the overload in the invested resources that it generates. Additionally, we propose a hybrid model that combines the statistical prediction methods with the ML method, permitting authorities to implement a smooth transition from the preexisting model to the ML-based model. This hybrid nature enables a decision-making process to optimally balance between the efficiency of the police system and aggressiveness of the protection measures taken.

Machine learning for risk assessment in gender-based crime

TL;DR

This work proposes to apply Machine Learning (ML) techniques to create models that accurately predict the recidivism risk of a gender-violence offender, and proposes a hybrid model that combines the statistical prediction methods with the ML method, permitting authorities to implement a smooth transition from the preexisting model to the ML-based model.

Abstract

Gender-based crime is one of the most concerning scourges of contemporary society. Governments worldwide have invested lots of economic and human resources to radically eliminate this threat. Despite these efforts, providing accurate predictions of the risk that a victim of gender violence has of being attacked again is still a very hard open problem. The development of new methods for issuing accurate, fair and quick predictions would allow police forces to select the most appropriate measures to prevent recidivism. In this work, we propose to apply Machine Learning (ML) techniques to create models that accurately predict the recidivism risk of a gender-violence offender. The relevance of the contribution of this work is threefold: (i) the proposed ML method outperforms the preexisting risk assessment algorithm based on classical statistical techniques, (ii) the study has been conducted through an official specific-purpose database with more than 40,000 reports of gender violence, and (iii) two new quality measures are proposed for assessing the effective police protection that a model supplies and the overload in the invested resources that it generates. Additionally, we propose a hybrid model that combines the statistical prediction methods with the ML method, permitting authorities to implement a smooth transition from the preexisting model to the ML-based model. This hybrid nature enables a decision-making process to optimally balance between the efficiency of the police system and aggressiveness of the protection measures taken.

Paper Structure

This paper contains 19 sections, 10 equations, 5 figures, 4 tables.

Figures (5)

  • Figure 1: Abstract representation of the NC prediction method. Each level in the plot corresponds to a phase of the process. From top to bottom: initialization, training, system ready and exploitation.
  • Figure 2: Evolution of the police protection metric for the hybrid model with the best NC model with varying values of $\mu$. For the hybrid predictor, the best NC model (Metric = Euclidean and Shrink threshold = $0.1$) is used. The values of $\mu$ are taken from an uniform grid of the interval $[0,1]$ with $200$ equispaced points. For each value of $\mu$, a random sample of $10$ executions is considered. The solid line shows the mean value of the police protection metric along these executions, and the shadowed region is the $0.95$ confidence interval around this value. The length of this interval is short enough to provide sound evidence of behaviour of the evolution. Narrower confidence intervals are obtained for larger samples, with similar trends in the evolution of the metric.
  • Figure 3: Results of the Police resources metric for different values of the penalty $\tau$ for the hybrid model. For the hybrid model, the best NC model (Metric = Euclidean and Shrink threshold = $0.1$) is used. The parameter $\mu$ was uniformly sampled in the interval $[0,1]$ with $200$ sample points.
  • Figure 4: Illustration of the procedure for adjusting the optimal value of $\mu$ with resource constraints. In this plot, the penalty of the police system has been set to $\tau = 0.85$ and the maximum acceptable resources overload to $r_0 = 0.1456$ ($4\%$ of increase with respect to the initial value of $0.14$). The line $y=r_0$ in the police resources plot (on the left) intersects the resources function at the optimal value $\mu_0$ ($\mu_0 = 0.651$ in this plot). The obtained gain in protection can be read from the police protection plot (on the right). In this example this value corresponds to $1.485$, which amounts to a increase of the $13\%$ with respect to the original value $1.31$. Therefore, an extra investment of the $4\%$ in resources leads to an improvement of the $13\%$ in the provided protection. For smaller values of $\tau$, the gain is even bigger.
  • Figure 5: Correlation plot of some of the collected answers to the VPR form. The plot compares the answers of $1000$ randomly chosen cases from the VioGen dataset. The diagonal plots show the distribution of the answers of the chosen questions. The off-diagonal plots compares two-by-two the answers to the questions. For each plot, the answers are placed on a rectangular grid with as many columns (resp. rows) as response options have the question displayed horizontally (resp. vertically). The left-most points of each plot for the horizontal axis (resp. bottom points for the vertical axis) correspond to 'No'/'Very mild' responses, whereas the right-most points (resp. upper points) correspond to 'Yes'/'Very severe' responses. Missing responses were assigned to a medium value. A small random noise was added to improve visualization.