Table of Contents
Fetching ...

Inclusive Emotion Technologies: Addressing the Needs of d/Deaf and Hard of Hearing Learners in Video-Based Learning

Si Chen, Jason Situ, Haocong Cheng, Suzy Su, Desiree Kirst, Lu Ming, Qi Wang, Lawrence Angrave, Yun Huang

TL;DR

The paper investigates how d/Deaf and Hard of Hearing (DHH) learners experience emotion-aware video-based learning using self-reported emotions and Automatic Emotion Recognition (AER). Through a mixed-method study with 20 DHH and 20 hearing college students, it shows that DHH learners rely more on self-reports written in text and on rewatching to recall emotions, while expressing concerns about AER accuracy and privacy, and demanding culturally aligned, sign-language–oriented design. The findings reveal significant differences in meta-cognitive emotional reflection between groups and highlight the moderating role of language diversity on describing visualizations. The work offers design implications—including segmentation, richer non-textual emotion cues (e.g., ASL comments), and blended emotion data use—for more inclusive emotion technologies in video-based learning, grounded in Inclusive Special Education and SRL theory.

Abstract

Accessibility efforts for d/Deaf and hard of hearing (DHH) learners in video-based learning have mainly focused on captions and interpreters, with limited attention to learners' emotional awareness--an important yet challenging skill for effective learning. Current emotion technologies are designed to support learners' emotional awareness and social needs; however, little is known about whether and how DHH learners could benefit from these technologies. Our study explores how DHH learners perceive and use emotion data from two collection approaches, self-reported and automatic emotion recognition (AER), in video-based learning. By comparing the use of these technologies between DHH (N=20) and hearing learners (N=20), we identified key differences in their usage and perceptions: 1) DHH learners enhanced their emotional awareness by rewatching the video to self-report their emotions and called for alternative methods for self-reporting emotion, such as using sign language or expressive emoji designs; and 2) while the AER technology could be useful for detecting emotional patterns in learning experiences, DHH learners expressed more concerns about the accuracy and intrusiveness of the AER data. Our findings provide novel design implications for improving the inclusiveness of emotion technologies to support DHH learners, such as leveraging DHH peer learners' emotions to elicit reflections.

Inclusive Emotion Technologies: Addressing the Needs of d/Deaf and Hard of Hearing Learners in Video-Based Learning

TL;DR

The paper investigates how d/Deaf and Hard of Hearing (DHH) learners experience emotion-aware video-based learning using self-reported emotions and Automatic Emotion Recognition (AER). Through a mixed-method study with 20 DHH and 20 hearing college students, it shows that DHH learners rely more on self-reports written in text and on rewatching to recall emotions, while expressing concerns about AER accuracy and privacy, and demanding culturally aligned, sign-language–oriented design. The findings reveal significant differences in meta-cognitive emotional reflection between groups and highlight the moderating role of language diversity on describing visualizations. The work offers design implications—including segmentation, richer non-textual emotion cues (e.g., ASL comments), and blended emotion data use—for more inclusive emotion technologies in video-based learning, grounded in Inclusive Special Education and SRL theory.

Abstract

Accessibility efforts for d/Deaf and hard of hearing (DHH) learners in video-based learning have mainly focused on captions and interpreters, with limited attention to learners' emotional awareness--an important yet challenging skill for effective learning. Current emotion technologies are designed to support learners' emotional awareness and social needs; however, little is known about whether and how DHH learners could benefit from these technologies. Our study explores how DHH learners perceive and use emotion data from two collection approaches, self-reported and automatic emotion recognition (AER), in video-based learning. By comparing the use of these technologies between DHH (N=20) and hearing learners (N=20), we identified key differences in their usage and perceptions: 1) DHH learners enhanced their emotional awareness by rewatching the video to self-report their emotions and called for alternative methods for self-reporting emotion, such as using sign language or expressive emoji designs; and 2) while the AER technology could be useful for detecting emotional patterns in learning experiences, DHH learners expressed more concerns about the accuracy and intrusiveness of the AER data. Our findings provide novel design implications for improving the inclusiveness of emotion technologies to support DHH learners, such as leveraging DHH peer learners' emotions to elicit reflections.
Paper Structure (37 sections, 5 figures)

This paper contains 37 sections, 5 figures.

Figures (5)

  • Figure 1: Three major steps of the study process. The self-reporting dashboard and reflections dashboard are illustrated in Figure \ref{['fig:selfrepointerface']} and Figure \ref{['fig:flow']}, respectively.
  • Figure 2: Interface for Step 2, self-reporting emotions in our prototype for video-based learning. This interface contains the following components: (a) A video player that had the same educational video they watched in previous steps. Participants used the video player to choose the timestamp they would like to self-report emotion at. (b) A button to make self-reported emotion at the current timestamp that the video was paused at. (c) A pop-up window after clicking on (b) to make self-reported emotion at. This pop-up window has three parts: the current timestamp that the video was paused at, a drop-down menu where participants may choose from nine emojis or no emoji, and a text comment box for optional additional text comments. (d) Emotion legend kucher2018statehsu2013seeingchen2022mirrormirrorus which participants may refer to. The emotion legend shows the similar categories of self-reported emojis on the arousal-valence axis. (e) A button to continue to the next step using a post-learning dashboard, shown in Figure \ref{['fig:flow']}.
  • Figure 3: Interface of the post-learning dashboard in our prototype for video-based learning. Participants employed think-aloud protocol while reflecting The dashboard contains the following components: (a) A video player that had the same educational video they watched in previous steps. The video player would fast-forward to any timestamp the participant selected in the components below. (b) Peers' emotions. The peers' emotions are aggregated high & low emotional spikes (above), and positive & negative emotions segments (below) from peers. Participants may refer to the legends on the right of each type of peers' emotions. (c) Participants' own emotion intensity line chart over the duration of the video. The line curve represents the high and low intensity (arousal) of participants, whereas the warm and cool colors represent the positive and negative emotions (valence). (d) Participants' self-reports. Participants may over onto a tick to review the emoji and text comments in that self-report. (e) Captured image by AER. The dashboard shows the captured facial expression image at a selected timestamp for participants to review and understand AER outputs. (f) Emotion legend, which is the same as in Figure \ref{['fig:selfrepointerface']} (d). The emotion legend shows a color-coding of valence for emotion intensity from (c). The sample line graph in (c) is from DP17.
  • Figure 4: Boxplots of the four types of meta-cognitive sentences created via the think-aloud protocol between the DHH and hearing participants. Mann-Whitney U-tests suggested that hearing participants expressed significantly more sentences on Recognizing Own Emotions, Recalling Content to Explain Own Emotions, and Building Shared Knowledge than DHH participants. ** denotes a significant difference with p<.01, and *** denotes a significant difference with p<.001.
  • Figure 5: Boxplot of valence and arousal values for talking-head and non-talking-head segments for DHH and hearing participants. The 18 talking-head segments and 17 non-talking-head segments appeared alternatively throughout the video. Two-sample t-tests showed that there are significant differences in arousal values for both talking-head and non-talking-head segments between DHH and hearing participants. Specifically, DHH participants showed lower intensity during talking-head segments and higher intensity during non-talking-head segments. * denotes a significant difference with p<.05.