Statistical data analysis for Tourism in Poland in R Programming Environment
Saad Ahmed Jamal
TL;DR
The study applies the $R$ programming environment to Polish tourism data, using descriptive statistics, visualisations, and inferential methods to examine expenditure patterns and relationships among trip characteristics. It finds a significant association between accommodation type and trip purpose, and a moderately strong correlation between organizer and private expenditures, while total expenditure shows limited differentiation across groups due to normality violations that preclude standard ANOVA. The analysis also includes a Bat Morphometric dataset to illustrate strong size-weight relationships and non-parametric testing when normality fails. Overall, the work demonstrates a replicable $R$-based workflow for tourism analytics and provides open-source code on GitHub to support data-driven decision-making in tourism management and related ecological analyses.
Abstract
This study utilises the R programming language for statistical data analysis to understand Tourism dynamics in Poland. It focuses on methods for data visualisation, multivariate statistics, and hypothesis testing. To investigate the expenditure behavior of tourist, spending patterns, correlations, and associations among variables were analysed in the dataset. The results revealed a significant relationship between accommodation type and the purpose of trip, showing that the purpose of a trip impacts the selection of accommodation. A strong correlation was observed between organizer expenditure and private expenditure, indicating that individual spending are more when the spending on organizing the trip are higher. However, no significant difference was observed in total expenditure across different accommodation types and purpose of the trip revealing that travelers tend to spend similar amounts regardless of their reason for travel or choice of accommodation. Although significant relationships were observed among certain variables, ANOVA could not be applied because the dataset was not able to hold on the normality assumption. In future, the dataset can be explored further to find more meaningful insights. The developed code is available on GitHub: https://github.com/SaadAhmedJamal/DataAnalysis RProgEnv.
