Table of Contents
Fetching ...

Characterizing Data Scientists in the Real World

Paula Pereira, Jácome Cunha, João P. Fernandes

TL;DR

A public survey is conducted to fully understand who is doing data science, how they work, what are the skills they hold and lack, and which tools they use and need to characterize the current generation of data scientists.

Abstract

Data collection is pervasively bound to our digital lifestyle. A recent study by the IDC reports that the growth of the data created and replicated in 2020 was even higher than in the previous years due to pandemic-related confinements to an astonishing global amount of 64.2 zettabytes of data. While not all the produced data is meant to be analyzed, there are numerous companies whose services/products rely heavily on data analysis. That is to say that mining the produced data has already revealed great value for businesses in different sectors. But to be able to fully realize this value, companies need to be able to hire professionals that are capable of gleaning insights and extracting value from the available data. We hypothesize that people nowadays conducting data-science-related tasks in practice may not have adequate training or formation. So in order to be able to fully support them in being productive in their duties, e.g. by building appropriate tools that increase their productivity, we first need to characterize the current generation of data scientists. To contribute towards this characterization, we conducted a public survey to fully understand who is doing data science, how they work, what are the skills they hold and lack, and which tools they use and need.

Characterizing Data Scientists in the Real World

TL;DR

A public survey is conducted to fully understand who is doing data science, how they work, what are the skills they hold and lack, and which tools they use and need to characterize the current generation of data scientists.

Abstract

Data collection is pervasively bound to our digital lifestyle. A recent study by the IDC reports that the growth of the data created and replicated in 2020 was even higher than in the previous years due to pandemic-related confinements to an astonishing global amount of 64.2 zettabytes of data. While not all the produced data is meant to be analyzed, there are numerous companies whose services/products rely heavily on data analysis. That is to say that mining the produced data has already revealed great value for businesses in different sectors. But to be able to fully realize this value, companies need to be able to hire professionals that are capable of gleaning insights and extracting value from the available data. We hypothesize that people nowadays conducting data-science-related tasks in practice may not have adequate training or formation. So in order to be able to fully support them in being productive in their duties, e.g. by building appropriate tools that increase their productivity, we first need to characterize the current generation of data scientists. To contribute towards this characterization, we conducted a public survey to fully understand who is doing data science, how they work, what are the skills they hold and lack, and which tools they use and need.

Paper Structure

This paper contains 34 sections, 15 figures, 3 tables.

Figures (15)

  • Figure 1: Other learning methods.
  • Figure 2: Satisfaction by background.
  • Figure 3: Satisfaction by gender.
  • Figure 4: Applying deep learning techniques by Background.
  • Figure 5: Access to relevant data by background.
  • ...and 10 more figures