Machine learning approach in the development of building occupant personas
Sheik Murad Hassan Anik, Xinghua Gao, Na Meng
TL;DR
This work tackles the manual bottleneck in building occupant persona development by applying a machine learning–driven semi-automated workflow to the US RECS $2015$ dataset. It evaluates six classifiers—LDA, KNN, CART, SVM, AdaBoost, and Random Forest—across $16$ occupant characteristics after reducing $759$ input features to $389$ through preprocessing and feature selection, using an $80-20$ train-test split and $10$-fold cross-validation. The results show an average accuracy of $61 ext{ extpercent}$, with some attributes such as NHSLDMEM, NUMADULT, NUMCHILD achieving $99 ext{ extpercent}$–$100 ext{ extpercent}$ accuracy, and others like TEMPGONE/TEMPNITE performing poorly, highlighting both the feasibility and limitations of semi-automatic persona generation. The study demonstrates potential for reducing manual labor in persona development but notes that fully automated, high-fidelity persona construction requires more data, hyperparameter tuning, and possibly deeper models, with future work including RECS $2020$ and deeper learning approaches. Overall, the work contributes a data-driven, semi-automated pathway to occupant personas that can enhance occupant-centric building design and energy modeling when paired with larger, more diverse datasets.
Abstract
The user persona is a communication tool for designers to generate a mental model that describes the archetype of users. Developing building occupant personas is proven to be an effective method for human-centered smart building design, which considers occupant comfort, behavior, and energy consumption. Optimization of building energy consumption also requires a deep understanding of occupants' preferences and behaviors. The current approaches to developing building occupant personas face a major obstruction of manual data processing and analysis. In this study, we propose and evaluate a machine learning-based semi-automated approach to generate building occupant personas. We investigate the 2015 Residential Energy Consumption Dataset with five machine learning techniques - Linear Discriminant Analysis, K Nearest Neighbors, Decision Tree (Random Forest), Support Vector Machine, and AdaBoost classifier - for the prediction of 16 occupant characteristics, such as age, education, and, thermal comfort. The models achieve an average accuracy of 61% and accuracy over 90% for attributes including the number of occupants in the household, their age group, and preferred usage of heating or cooling equipment. The results of the study show the feasibility of using machine learning techniques for the development of building occupant persona to minimize human effort.
