An Advanced Method to Predict Personality Traits by Combining Different Types of Information

Psychologists created the "Big Five" to group personality traits. New technology lets us study different types of data (text, audio, visual) to make more accurate predictions. Our research uses deep learning to improve this process.
Like

Share this post

Choose a social network to share with, or copy the URL to share elsewhere

This is a representation of how your post may appear on social media. The actual post will vary between social networks

Read the paper

SpringerLink
SpringerLink SpringerLink

A deep multimodal fusion method for personality traits prediction - Multimedia Tools and Applications

Personality traits influence an individual’s behavior, preferences and decision-making processes, making automated personality recognition an important area of research. In this paper, we propose a novel deep multimodal fusion for predicting personality traits from diverse data modalities, including text, audio, and visual inputs. Our proposed method extracts complex patterns and features from these multimodal data sources using advanced deep learning methods including Convolutional Neural Networks (CNNs), Long Short-Term Memory (LSTM) networks, and Vision Transformer (ViT). Specifically, we use pre-trained models ViT-B16 and VGG16 for visual feature extraction, VGGish for audio feature extraction, and GloVe for text analysis. Additionally, we investigate the potential benefits of using self-attention and cross-attention mechanisms to provide accurate predictions regarding an individual’s personality traits. The method we propose combines information from several modalities using various fusion techniques, improving the predictive capability of the model. Experimental results using the publicly available ChaLearn First Impressions-V2 dataset demonstrate that our technique is effective, achieving higher accuracy than the current literature. This work contributes to the advancement of multimodal deep learning techniques and provides valuable results in the field of personality recognition.

Our approach is designed to take personality prediction to the next level by using not just one but three different types of data: visual, audio, and text! Each of these "modalities" helps paint a more complete picture of a person's personality traits.

We use the latest computer vision techniques to capture visual information from videos. Two powerful models, Vision Transformer (ViT-B16) and VGG16, help us detect scenes and facial expressions. This lets us read important visual cues, like whether someone is smiling or looking serious.

We listen too! Audio helps us understand personality. By analysing how someone speaks, we gain insight into their personality traits. And of course, words matter! Using advanced natural language processing (NLP) techniques, we analyze the text, such as the transcript of what someone says in a video, to capture their linguistic patterns.  So, how does it all come together? Imagine we have a video of someone. From that single video, we pull features from three places: the entire scene (what’s going on around the person), the person’s face (the main focus), and their voice. These features can also be extracted from text. They are then trained and combined.


We use four models to analyse data. They help us understand personality and what makes people unique.

Please sign in or register for FREE

If you are a registered user on Research Communities by Springer Nature, please sign in

Follow the Topic

Computers and Society
Mathematics and Computing > Computer Science > Computing Milieux > Computers and Society
Computer and Information Systems Applications
Mathematics and Computing > Computer Science > Computer and Information Systems Applications
Psychological Methods
Humanities and Social Sciences > Behavioral Sciences and Psychology > Psychological Methods