Table of Contents
Fetching ...

Life Sequence Transformer: Generative Modelling of Socio-Economic Trajectories from Administrative Data

Alberto Cabezas, Carlotta Montorsi

TL;DR

Life Sequence Transformer develops a decoder-only Transformer to generatively model long-run socio-economic life histories from administrative data. It introduces a calendar-like data encoding that represents overlapping events across life domains and enables generation of future and counterfactual trajectories conditioned on observed histories. The model is trained on 1.65M WHIP histories from INPS (1990–2015) and demonstrates realistic labor-market dynamics and alignment with established causal relationships, such as motherhood penalties and birth-month retirement effects, while enabling out-of-sample generation. The work highlights opportunities for policy analysis with counterfactuals, alongside methodological and ethical considerations in using administrative data for individual life-course simulation.

Abstract

Generative modelling with Transformer architectures can simulate complex sequential structures across various applications. We extend this line of work to the social sciences by introducing a Transformer-based generative model tailored to longitudinal socio-economic data. Our contributions are: (i) we design a novel encoding method that represents socio-economic life histories as sequences, including overlapping events across life domains; and (ii) we adapt generative modelling techniques to simulate plausible alternative life trajectories conditioned on past histories. Using large-scale data from the Italian social security administration (INPS), we show that the model can be trained at scale, reproduces realistic labour market patterns consistent with known causal relationships, and generates coherent hypothetical life paths. This work demonstrates the feasibility of generative modelling for socio-economic trajectories and opens new opportunities for policy-oriented research, with counterfactual generation as a particularly promising application.

Life Sequence Transformer: Generative Modelling of Socio-Economic Trajectories from Administrative Data

TL;DR

Life Sequence Transformer develops a decoder-only Transformer to generatively model long-run socio-economic life histories from administrative data. It introduces a calendar-like data encoding that represents overlapping events across life domains and enables generation of future and counterfactual trajectories conditioned on observed histories. The model is trained on 1.65M WHIP histories from INPS (1990–2015) and demonstrates realistic labor-market dynamics and alignment with established causal relationships, such as motherhood penalties and birth-month retirement effects, while enabling out-of-sample generation. The work highlights opportunities for policy analysis with counterfactuals, alongside methodological and ethical considerations in using administrative data for individual life-course simulation.

Abstract

Generative modelling with Transformer architectures can simulate complex sequential structures across various applications. We extend this line of work to the social sciences by introducing a Transformer-based generative model tailored to longitudinal socio-economic data. Our contributions are: (i) we design a novel encoding method that represents socio-economic life histories as sequences, including overlapping events across life domains; and (ii) we adapt generative modelling techniques to simulate plausible alternative life trajectories conditioned on past histories. Using large-scale data from the Italian social security administration (INPS), we show that the model can be trained at scale, reproduces realistic labour market patterns consistent with known causal relationships, and generates coherent hypothetical life paths. This work demonstrates the feasibility of generative modelling for socio-economic trajectories and opens new opportunities for policy-oriented research, with counterfactual generation as a particularly promising application.

Paper Structure

This paper contains 37 sections, 15 equations, 15 figures, 5 tables.

Figures (15)

  • Figure 1: The steps for converting data from Tabular format to the Generative model suitable format: (1) Convert relevant features to concept tokens (2) Look up relevant temporal information such as age and year (3) Concatenate token, time, and positional embeddings to the sequence.
  • Figure 2: Model-generated average duration with 90% confidence interval of mobility allowance receipt duration as a function of worker age at job loss in months in a 10-year bandwidth around the 40-year-old threshold.
  • Figure 3: Real and model-generated trajectories of average annual earnings for working mothers in a 10-year window before and after their first maternity leave episode (offset--0). The solid lines represent average annual earnings, while the shaded areas denote the one standard deviation range.
  • Figure 4: Real (circle dots) and model-generated (triangle dots) average retirement ages, measured in months, for male individuals born between 1940 and 1950, using a cutoff one year before the observed retirement year. Each point indicates the cohort-specific average retirement age for January (M1) and December (M12), with vertical bars representing one standard deviation.
  • Figure 5: Real (circle dots) and model-generated (triangle dots) average retirement ages, measured in months, for male individuals born between 1940 and 1950, using a cutoff four years before the observed retirement year. Each point indicates the cohort-specific average retirement age for January (M1) and December (M12), with vertical bars representing one standard deviation.
  • ...and 10 more figures