Table of Contents
Fetching ...

Enhancing behavioral nudges with large language model-based iterative personalization: A field experiment on electricity and hot-water conservation

Zonghan Li, Yi Liu, Chunyan Wang, Song Tong, Kaiping Peng, Feng Ji

Abstract

Nudging is widely used to promote behavioral change, but its effectiveness is often limited when recipients must repeatedly translate feedback into workable next steps under changing circumstances. Large language models (LLMs) may help reduce part of this cognitive work by generating personalized guidance and updating it iteratively across intervention rounds. We developed an LLM agent for iterative personalization and tested it in a three-arm randomized experiment among 233 university residents in China, using daily electricity and shower hot-water conservation as objectively measured cases differing in friction. LLM-personalized nudges (T2) produced the largest conservation effects, while image-enhanced conventional nudges (T1) and text-based conventional nudges (C) showed similar outcomes (omnibus p = 0.009). Relative to C, T2 reduced electricity consumption by 0.56 kWh per room-day (p = 0.014), corresponding to an 18.3 percentage-point higher adjusted saving rate. This advantage emerged within the first two intervention rounds, alongside iterative updating of personalized guidance, and persisted thereafter. Hot-water outcomes followed the same direction but were smaller, less precisely estimated, and attenuated over time, consistent with stronger friction in this domain. LLM-personalized nudges emphasized prospective and context-specific guidance and were associated with higher participant engagement. This study provides field evidence that LLM-based iterative personalization can enhance behavioral nudging, with behavioral friction as a potential boundary condition. Larger trials and extension to more behaviors are warranted.

Enhancing behavioral nudges with large language model-based iterative personalization: A field experiment on electricity and hot-water conservation

Abstract

Nudging is widely used to promote behavioral change, but its effectiveness is often limited when recipients must repeatedly translate feedback into workable next steps under changing circumstances. Large language models (LLMs) may help reduce part of this cognitive work by generating personalized guidance and updating it iteratively across intervention rounds. We developed an LLM agent for iterative personalization and tested it in a three-arm randomized experiment among 233 university residents in China, using daily electricity and shower hot-water conservation as objectively measured cases differing in friction. LLM-personalized nudges (T2) produced the largest conservation effects, while image-enhanced conventional nudges (T1) and text-based conventional nudges (C) showed similar outcomes (omnibus p = 0.009). Relative to C, T2 reduced electricity consumption by 0.56 kWh per room-day (p = 0.014), corresponding to an 18.3 percentage-point higher adjusted saving rate. This advantage emerged within the first two intervention rounds, alongside iterative updating of personalized guidance, and persisted thereafter. Hot-water outcomes followed the same direction but were smaller, less precisely estimated, and attenuated over time, consistent with stronger friction in this domain. LLM-personalized nudges emphasized prospective and context-specific guidance and were associated with higher participant engagement. This study provides field evidence that LLM-based iterative personalization can enhance behavioral nudging, with behavioral friction as a potential boundary condition. Larger trials and extension to more behaviors are warranted.

Paper Structure

This paper contains 31 sections, 5 figures.

Figures (5)

  • Figure 1: Study design and intervention components.a. Overall trial design. b. Examples of nudge content for T1 and T2, illustrating the shared usage-report backbone (weekly usage report) in both groups and the additional LLM-generated suggestions in T2. c. Decomposition of intervention components across conditions.
  • Figure 2: Conservation outcomes during the intervention.a. Distribution of individual average daily electricity consumption during the intervention period (left) and group-level saving rates relative to baseline (right) for C, T1, and T2. Bars indicate group means with 95% confidence intervals. b. Same as panel a, but for hot water. c. Weekly electricity saving rates for T1 and T2 relative to C. d. Same as panel c, but for hot water. Analytic sample sizes were n = 169 for electricity and n = 166 for hot water; group-specific counts are reported in Supplementary Table \ref{['tab:s1']}.
  • Figure 3: Content characteristics and participant engagement.a. Topic composition of nudge content estimated by topic modeling, comparing conventional nudges (C & T1) with LLM-personalized nudges (T2). We merge C and T1 as they share the same content with differences only in the format. Stacked bars show topic probability (%). b. Mean keyword counts per LLM-personalized nudge in T2 for five content categories across intervention stages. c. Exploratory post-nudge survey ratings on 5-point scales for perceived accuracy, actionability, and satisfaction in the mid and final rounds among T2 participants. d. Engagement rates for C, T1, and T2. e. Time to the first missed 48-hour reply window following a nudge, with shaded areas indicating 95% CI.
  • Figure 4: Heterogeneity of nudge effectiveness.a-c. Distribution of individual treatment effects (ITEs) estimated via meta-learner for pooled analysis (a), electricity (b) and hot water (c), comparing T1 and T2. Each point represents one participant's ITE relative to the control condition. Distributions are shown as raincloud plots combining kernel density estimates, box plots (median, interquartile range), and individual data points. Dashed lines indicate mean effects. d--e. Behavioral archetypes identified through trajectory clustering for electricity (d) and hot water (e). Upper panels show individual consumption trajectories (thin lines) and archetype means (thick lines) across intervention rounds. Lower panels show the percentage of participants in each archetype by treatment group. Shaded areas denote ±1 s.d. around cluster means.
  • Figure 5: Predictors of conservation behaviors and their dynamics.a. Normalized feature importance scores from XGBoost models predicting average daily consumption during the intervention for electricity (green square) and hot water (purple circle). b. Category-level feature importance in early and late intervention stages for electricity. At each stage, a separate XGBoost model was trained using the same feature set. c. Same as panel b, but for hot water.