Motivation
The utilization of human affects such as emotions, moods, and feelings is increasingly recognized as a crucial factor in enriching the interaction between humans and diverse machines and systems. Therefore, technologies capable of detecting and recognizing emotions will contribute to advancements across multiple domains, including HMI device, robotics, marketing, healthcare, education, etc. Nonetheless, decoding and encoding emotional information have been a complex task due to the inherent abstraction, complexity, and personalized nature of emotions.
Until now, conventional approaches for recognizing emotional information from humans often rely on analyzing images of facial expressions, speech of verbal expression, text analysis techniques, and physiological signals. However, these methods are impeded by not only environmental factors like lighting conditions, noise interference, and physical obstructions but also the requirement of bulky equipment, which limits their application to everyday communication scenarios.
Highlights of our research
In this work, we propose a human emotion recognition system in an attempt to utilize complex emotional states with our personalized skin-integrated facial interface (PSiFI) offering simultaneous detection and integration of facial expression and vocal speech. In general, human emotions are mostly expressed in a complex emotional context where both facial and verbal expressions are simultaneously engaged. For this reason, we tried to effectively measure and analyze combinatorial emotional information by multi-modal sensing of both facial and verbal expressions with the PSiFI system.
Basically, the PSiFI incorporates a personalized facial mask that is self-powered, easily applicable, stretchable, transparent, capable of wireless communication, and highly customized to conformally fit into an individual’s facial contour based on 3D face reconstruction. The sensing part of the PSiFI is comprised of strain and vibration sensing units based on triboelectrification to detect facial strain for facial expression and vocal vibration for speech recognition, respectively. The incorporation of a triboelectric nanogenerator (TENG) enables the sensor device to possess self-powering capabilities while offering a broad range of design possibilities in terms of materials and architecture.
Moreover, to encode the combinatorial sensing signals into personalized feedback parameters, we employ a convolutional neural network (CNN)-based classification technique that rapidly adapts to an individual’s context via transfer learning. With the machine learning techniques, we successfully identified the complex emotional context as high-level emotional information and demonstrated the digital concierge application as an exciting possibility in a virtual reality (VR) environment via human machine interfaces (HMIs) with our PSiFI. The digital concierge recognizes a user’s intention and interactively offers helpful services depending on the user’s affectivity.
Looking ahead
Our work presents a promising way to help to consistently collect data regarding emotional speech with barrier-free communication and can pave the way toward acceleration of digital transformation by demonstrating real-time human emotion recognition and digital concierge applications in a VR environment, which can give examples of how the next generation of wearable systems will be able to utilize a very complex form of human information.