Table of Contents
Fetching ...

Sentiment Analysis of ML Projects: Bridging Emotional Intelligence and Code Quality

Md Shoaib Ahmed, Dongyoung Park, Nasir U. Eisty

TL;DR

The paper investigates whether developer sentiments influence code quality in ML-centric projects. It employs a five-model sentiment ensemble (VADER, TextBlob, Pattern, BERT, spaCyTextBlob) with max voting to label sentiment from GitHub issue comments across 20 curated ML repositories, and assesses code quality via SonarQube metrics on bugs, vulnerabilities, security hotspots, code smells, and duplication. The study finds a generally positive developer sentiment associated with better code quality and a negative sentiment linked to more issues, with neutral emotions showing weaker or inconsistent relationships; it also highlights variability across projects. The work contributes a rigorous SA pipeline tailored to SE contexts, links emotional dynamics to software quality, and suggests that fostering positive emotional climates and emotional intelligence in teams can improve ML project health and maintainability.

Abstract

This study explores the intricate relationship between sentiment analysis (SA) and code quality within machine learning (ML) projects, illustrating how the emotional dynamics of developers affect the technical and functional attributes of software projects. Recognizing the vital role of developer sentiments, this research employs advanced sentiment analysis techniques to scrutinize affective states from textual interactions such as code comments, commit messages, and issue discussions within high-profile ML projects. By integrating a comprehensive dataset of popular ML repositories, this analysis applies a blend of rule-based, machine learning, and hybrid sentiment analysis methodologies to systematically quantify sentiment scores. The emotional valence expressed by developers is then correlated with a spectrum of code quality indicators, including the prevalence of bugs, vulnerabilities, security hotspots, code smells, and duplication instances. Findings from this study distinctly illustrate that positive sentiments among developers are strongly associated with superior code quality metrics manifested through reduced bugs and lower incidence of code smells. This relationship underscores the importance of fostering positive emotional environments to enhance productivity and code craftsmanship. Conversely, the analysis reveals that negative sentiments correlate with an uptick in code issues, particularly increased duplication and heightened security risks, pointing to the detrimental effects of adverse emotional conditions on project health.

Sentiment Analysis of ML Projects: Bridging Emotional Intelligence and Code Quality

TL;DR

The paper investigates whether developer sentiments influence code quality in ML-centric projects. It employs a five-model sentiment ensemble (VADER, TextBlob, Pattern, BERT, spaCyTextBlob) with max voting to label sentiment from GitHub issue comments across 20 curated ML repositories, and assesses code quality via SonarQube metrics on bugs, vulnerabilities, security hotspots, code smells, and duplication. The study finds a generally positive developer sentiment associated with better code quality and a negative sentiment linked to more issues, with neutral emotions showing weaker or inconsistent relationships; it also highlights variability across projects. The work contributes a rigorous SA pipeline tailored to SE contexts, links emotional dynamics to software quality, and suggests that fostering positive emotional climates and emotional intelligence in teams can improve ML project health and maintainability.

Abstract

This study explores the intricate relationship between sentiment analysis (SA) and code quality within machine learning (ML) projects, illustrating how the emotional dynamics of developers affect the technical and functional attributes of software projects. Recognizing the vital role of developer sentiments, this research employs advanced sentiment analysis techniques to scrutinize affective states from textual interactions such as code comments, commit messages, and issue discussions within high-profile ML projects. By integrating a comprehensive dataset of popular ML repositories, this analysis applies a blend of rule-based, machine learning, and hybrid sentiment analysis methodologies to systematically quantify sentiment scores. The emotional valence expressed by developers is then correlated with a spectrum of code quality indicators, including the prevalence of bugs, vulnerabilities, security hotspots, code smells, and duplication instances. Findings from this study distinctly illustrate that positive sentiments among developers are strongly associated with superior code quality metrics manifested through reduced bugs and lower incidence of code smells. This relationship underscores the importance of fostering positive emotional environments to enhance productivity and code craftsmanship. Conversely, the analysis reveals that negative sentiments correlate with an uptick in code issues, particularly increased duplication and heightened security risks, pointing to the detrimental effects of adverse emotional conditions on project health.
Paper Structure (20 sections, 2 equations, 7 figures)

This paper contains 20 sections, 2 equations, 7 figures.

Figures (7)

  • Figure 1: The proposed framework presents a methodical diagram that illustrates the entire process, starting from data collection and then identifying sentiment analysis and the relationship between sentiment and code quality metrics.
  • Figure 2: Distribution of Sentiment Metrics by various machine learning projects.
  • Figure 3: Code Quality Assessment of Machine Learning Projects
  • Figure 4: Correlation between Developer Sentiments and Bug Ratios
  • Figure 5: Correlation between Developer Sentiments and Code Smell Ratios
  • ...and 2 more figures