Table of Contents
Fetching ...

Towards Financially Inclusive Credit Products Through Financial Time Series Clustering

Tristan Bester, Benjamin Rosman

TL;DR

The work tackles financial inclusion by enabling customer segmentation without annotated data through time-series clustering of transaction histories. It develops a taxonomy of four component classes for deep representation learning-based clustering (autoencoder architecture, dimensionality reduction, pretext loss, clustering loss) and systematically evaluates their combinations on the Berka dataset to identify strong configurations. The authors introduce Financial Transaction History Clustering (FTHC), a CNN-based autoencoder with a Deep Temporal Clustering objective using Euclidean distance, which outperforms state-of-the-art methods on key clustering metrics. The study demonstrates that stable, well-tuned clustering can yield meaningful, human-interpretable segments, supporting tailored financial products for marginalised groups and enhancing the practical impact of financial inclusion efforts.

Abstract

Financial inclusion ensures that individuals have access to financial products and services that meet their needs. As a key contributing factor to economic growth and investment opportunity, financial inclusion increases consumer spending and consequently business development. It has been shown that institutions are more profitable when they provide marginalised social groups access to financial services. Customer segmentation based on consumer transaction data is a well-known strategy used to promote financial inclusion. While the required data is available to modern institutions, the challenge remains that segment annotations are usually difficult and/or expensive to obtain. This prevents the usage of time series classification models for customer segmentation based on domain expert knowledge. As a result, clustering is an attractive alternative to partition customers into homogeneous groups based on the spending behaviour encoded within their transaction data. In this paper, we present a solution to one of the key challenges preventing modern financial institutions from providing financially inclusive credit, savings and insurance products: the inability to understand consumer financial behaviour, and hence risk, without the introduction of restrictive conventional credit scoring techniques. We present a novel time series clustering algorithm that allows institutions to understand the financial behaviour of their customers. This enables unique product offerings to be provided based on the needs of the customer, without reliance on restrictive credit practices.

Towards Financially Inclusive Credit Products Through Financial Time Series Clustering

TL;DR

The work tackles financial inclusion by enabling customer segmentation without annotated data through time-series clustering of transaction histories. It develops a taxonomy of four component classes for deep representation learning-based clustering (autoencoder architecture, dimensionality reduction, pretext loss, clustering loss) and systematically evaluates their combinations on the Berka dataset to identify strong configurations. The authors introduce Financial Transaction History Clustering (FTHC), a CNN-based autoencoder with a Deep Temporal Clustering objective using Euclidean distance, which outperforms state-of-the-art methods on key clustering metrics. The study demonstrates that stable, well-tuned clustering can yield meaningful, human-interpretable segments, supporting tailored financial products for marginalised groups and enhancing the practical impact of financial inclusion efforts.

Abstract

Financial inclusion ensures that individuals have access to financial products and services that meet their needs. As a key contributing factor to economic growth and investment opportunity, financial inclusion increases consumer spending and consequently business development. It has been shown that institutions are more profitable when they provide marginalised social groups access to financial services. Customer segmentation based on consumer transaction data is a well-known strategy used to promote financial inclusion. While the required data is available to modern institutions, the challenge remains that segment annotations are usually difficult and/or expensive to obtain. This prevents the usage of time series classification models for customer segmentation based on domain expert knowledge. As a result, clustering is an attractive alternative to partition customers into homogeneous groups based on the spending behaviour encoded within their transaction data. In this paper, we present a solution to one of the key challenges preventing modern financial institutions from providing financially inclusive credit, savings and insurance products: the inability to understand consumer financial behaviour, and hence risk, without the introduction of restrictive conventional credit scoring techniques. We present a novel time series clustering algorithm that allows institutions to understand the financial behaviour of their customers. This enables unique product offerings to be provided based on the needs of the customer, without reliance on restrictive credit practices.
Paper Structure (38 sections, 6 equations, 9 figures, 2 tables)

This paper contains 38 sections, 6 equations, 9 figures, 2 tables.

Figures (9)

  • Figure 1: Percentage of invalid clusterings produced across all combinations. Results are shown for each clustering layer variant.
  • Figure 2: The effect of varied learning rates in the cluster optimisation phase. In the lower plot, it can be seen that the cluster centroids remain in their initial positions while the latent representations all converge to a single representation. Consequently, all data points are assigned to the same cluster. The stability of the upper model is clear from the converged latent space representation.
  • Figure 3: Average clustering performance associated with each autoencoder architecture.
  • Figure 4: Average clustering performance associated with each pretext loss function.
  • Figure 5: Average clustering performance associated with each clustering loss function.
  • ...and 4 more figures