Table of Contents
Fetching ...

Statistical data analysis for Tourism in Poland in R Programming Environment

Saad Ahmed Jamal

TL;DR

The study applies the $R$ programming environment to Polish tourism data, using descriptive statistics, visualisations, and inferential methods to examine expenditure patterns and relationships among trip characteristics. It finds a significant association between accommodation type and trip purpose, and a moderately strong correlation between organizer and private expenditures, while total expenditure shows limited differentiation across groups due to normality violations that preclude standard ANOVA. The analysis also includes a Bat Morphometric dataset to illustrate strong size-weight relationships and non-parametric testing when normality fails. Overall, the work demonstrates a replicable $R$-based workflow for tourism analytics and provides open-source code on GitHub to support data-driven decision-making in tourism management and related ecological analyses.

Abstract

This study utilises the R programming language for statistical data analysis to understand Tourism dynamics in Poland. It focuses on methods for data visualisation, multivariate statistics, and hypothesis testing. To investigate the expenditure behavior of tourist, spending patterns, correlations, and associations among variables were analysed in the dataset. The results revealed a significant relationship between accommodation type and the purpose of trip, showing that the purpose of a trip impacts the selection of accommodation. A strong correlation was observed between organizer expenditure and private expenditure, indicating that individual spending are more when the spending on organizing the trip are higher. However, no significant difference was observed in total expenditure across different accommodation types and purpose of the trip revealing that travelers tend to spend similar amounts regardless of their reason for travel or choice of accommodation. Although significant relationships were observed among certain variables, ANOVA could not be applied because the dataset was not able to hold on the normality assumption. In future, the dataset can be explored further to find more meaningful insights. The developed code is available on GitHub: https://github.com/SaadAhmedJamal/DataAnalysis RProgEnv.

Statistical data analysis for Tourism in Poland in R Programming Environment

TL;DR

The study applies the programming environment to Polish tourism data, using descriptive statistics, visualisations, and inferential methods to examine expenditure patterns and relationships among trip characteristics. It finds a significant association between accommodation type and trip purpose, and a moderately strong correlation between organizer and private expenditures, while total expenditure shows limited differentiation across groups due to normality violations that preclude standard ANOVA. The analysis also includes a Bat Morphometric dataset to illustrate strong size-weight relationships and non-parametric testing when normality fails. Overall, the work demonstrates a replicable -based workflow for tourism analytics and provides open-source code on GitHub to support data-driven decision-making in tourism management and related ecological analyses.

Abstract

This study utilises the R programming language for statistical data analysis to understand Tourism dynamics in Poland. It focuses on methods for data visualisation, multivariate statistics, and hypothesis testing. To investigate the expenditure behavior of tourist, spending patterns, correlations, and associations among variables were analysed in the dataset. The results revealed a significant relationship between accommodation type and the purpose of trip, showing that the purpose of a trip impacts the selection of accommodation. A strong correlation was observed between organizer expenditure and private expenditure, indicating that individual spending are more when the spending on organizing the trip are higher. However, no significant difference was observed in total expenditure across different accommodation types and purpose of the trip revealing that travelers tend to spend similar amounts regardless of their reason for travel or choice of accommodation. Although significant relationships were observed among certain variables, ANOVA could not be applied because the dataset was not able to hold on the normality assumption. In future, the dataset can be explored further to find more meaningful insights. The developed code is available on GitHub: https://github.com/SaadAhmedJamal/DataAnalysis RProgEnv.

Paper Structure

This paper contains 12 sections, 2 equations, 4 figures, 1 table.

Figures (4)

  • Figure 1: Methodological Flowchart.
  • Figure 2: Box plot showing the spread of numeric variables
  • Figure 3: Histogram showing (a) distribution of Accommodation Type (b) distribution of Total Expenditure
  • Figure 4: Scatterplot showing association between Private and Organizer Expenditure