Can we use people's words to test for psychological differences?

Natural language processing offers rich possibilities for measuring people's psychology through everyday language. But do words really reflect psychological characteristics? This study outlines several methods for checking the validity of inferring people's psychology from the words they use.
Can we use people's words to test for psychological differences?
Like

Share this post

Choose a social network to share with, or copy the URL to share elsewhere

This is a representation of how your post may appear on social media. The actual post will vary between social networks

In a recent post, I wrote about how the words people write on social media in China reveal cultural differences. Rice-farming southern China used more words reflecting collectivism, prevention orientation, and conflict avoidance than people in wheat-farming northern China. But this raises the question of whether analyzing word use is a valid method of measuring people's psychology.

One good reason to be skeptical is that words have lots of different meanings across contexts. How can we be sure that words about social relationships tap into collectivism or that cognitive words like "cause" and "because" tap into analytic thought? In our study, we tackled this question in three ways. 

Method 1: Criterion Validity

One way is to test whether regions’ use of these word categories correlate with things that collectivistic cultures tend to have more or less of. For example, several studies have found that collectivistic cultures have more three-generation households, tighter social norms, and lower divorce rates. And because I've tested people all over China with psychological tasks measuring cultural differences, we can compare word use to differences in holistic thought tasks in the lab.

These criterion validity correlations test whether word use on social media correlate with previously established markers of collectivism. Correlations in green are in the correct direction. Correlations in red are in the wrong direction.

 Most categories passed these validity checks, but some categories failed. One surprise was “I” versus “we.” 

Use of "I" versus "we" failed validity tests.

That’s surprising because several studies have used “we” to measure collectivism and “I” to measure individualism. This data casts doubt on whether we should be using "I" versus "we" pronouns to measure collectivism.

Method 2: Considering Dialects

Another concern is dialects. Doesn’t Chinese have lots of dialects that are very different? A careful analysis should make sure the differences are not thrown off course by dialect. 

Distribution of dialects across China

One fortunate thing (for researchers!) about dialects in China is that the Mandarin dialect is broad enough to include both rice and wheat areas. So we can test rice-wheat differences after limiting the sample just to areas that speak Mandarin. 

We also tried excluding Cantonese-speaking provinces because Cantonese is arguably the dialect with the most developed written system. The fact that rice-wheat differences remained significant suggests that the differences are independent from dialects. 

Method 3: Internal Validity

One simple method psychologists often use is to test whether variables that are supposedly measuring the same idea actually correlate with each other. For example, we created a word category of "universalism" words. These are words about broad human relationships (such as "humanity" and "the people"), rather than narrow, close relationships. 

If our theory is correct, people should tend to use these words together. For example, people who use "humanity" should be more likely to also use words like "the public." And people who tend not to use the word "global" should be less likely to use words like "the people." 

Psychologists often test this using a metric called Cronbach's alpha, although analyses of word frequencies can use a more precise metric called KR20. Some researchers suggest the alpha should be above 0.60, although expectations should take into account the context and the difficulty of measurement. 

These methods can help check whether people's language use is tapping into the psychology we think it is. Checks like these can avoid mistakes like the idea that "I," "me," and "my" reflect individualism, whereas "we," "us," and "ours" reflect collectivism. Although this idea is intuitively appealing, it failed validity checks. 

Please sign in or register for FREE

If you are a registered user on Research Communities by Springer Nature, please sign in

Go to the profile of Thomas Talhelm
about 2 months ago

Thanks to my co-authors, many of whom are at U Penn: 

Sharath Chandra Guntuku, Garrick Sherman, Angel Fan, Salvatore Giorgi, Lyle H. Ungar

And Liuqing Wei at Hubei University.  

Follow the Topic

Natural Language Processing (NLP)
Mathematics and Computing > Computer Science > Artificial Intelligence > Natural Language Processing (NLP)
Sociology of Culture
Humanities and Social Sciences > Society > Sociology > Sociology of Culture
Cognitive Psychology
Humanities and Social Sciences > Behavioral Sciences and Psychology > Cognitive Psychology
Personality and Differential Psychology
Humanities and Social Sciences > Behavioral Sciences and Psychology > Personality and Differential Psychology
Cross-Cultural Psychology
Humanities and Social Sciences > Behavioral Sciences and Psychology > Social Psychology > Cross-Cultural Psychology
Social Psychology
Humanities and Social Sciences > Behavioral Sciences and Psychology > Social Psychology

Related Collections

With collections, you can get published faster and increase your visibility.

Behavioural public policy

This collection invites original research addressing key issues and debates in the epistemology, semantics and empirics of behavioural public policy.

Publishing Model: Open Access

Deadline: Jun 20, 2025

Interdisciplinarity in theory and practice

This collection is concerned primarily with the theory and practice of interdisciplinarity.

Publishing Model: Open Access

Deadline: Dec 31, 2025