Life Sequence Transformer: Generative Modelling of Socio-Economic Trajectories from Administrative Data
Alberto Cabezas, Carlotta Montorsi
TL;DR
Life Sequence Transformer develops a decoder-only Transformer to generatively model long-run socio-economic life histories from administrative data. It introduces a calendar-like data encoding that represents overlapping events across life domains and enables generation of future and counterfactual trajectories conditioned on observed histories. The model is trained on 1.65M WHIP histories from INPS (1990–2015) and demonstrates realistic labor-market dynamics and alignment with established causal relationships, such as motherhood penalties and birth-month retirement effects, while enabling out-of-sample generation. The work highlights opportunities for policy analysis with counterfactuals, alongside methodological and ethical considerations in using administrative data for individual life-course simulation.
Abstract
Generative modelling with Transformer architectures can simulate complex sequential structures across various applications. We extend this line of work to the social sciences by introducing a Transformer-based generative model tailored to longitudinal socio-economic data. Our contributions are: (i) we design a novel encoding method that represents socio-economic life histories as sequences, including overlapping events across life domains; and (ii) we adapt generative modelling techniques to simulate plausible alternative life trajectories conditioned on past histories. Using large-scale data from the Italian social security administration (INPS), we show that the model can be trained at scale, reproduces realistic labour market patterns consistent with known causal relationships, and generates coherent hypothetical life paths. This work demonstrates the feasibility of generative modelling for socio-economic trajectories and opens new opportunities for policy-oriented research, with counterfactual generation as a particularly promising application.
