Behind the Paper

Behind the Paper – A measure of reliability convergence to select and optimize cognitive tasks for individual differences research

Ever felt like a genius, only to realize someone cracked your code a century ago? We did! Finding Spearman-Brown's 110-year-old formula turned our quest into a journey of rediscovery and resilience. Dive in, laugh, and explore how the past shapes our fresh take on cognitive research reliability!

Published in Neuroscience, Behavioural Sciences & Psychology, and Philosophy & Religion

Jul 05, 2024

Jan Kadlec

PhD student, Weizmann Institute of Science

Behind the Paper – A measure of reliability convergence to select and optimize cognitive tasks for individual differences research

Liked by India Ambler

Explore the Research

Nearly everyone in the world has encountered the term ‘correlation’. And everyone who has gone through at least a basic statistics course will be familiar with terms such as the Spearman correlation coefficient. Charles Edward Spearman was a giant in statistics and early cognitive psychology. In addition to formulating the famous rank correlation coefficient, he pioneered factor analysis, a crucial statistical method used to describe variability and shared contribution of observed and correlated variables to an unobserved latent variable. His seminal work further contains attempts to model human intelligence and the introduction of the infamous general intelligence factor. It was thus sort of an honour to learn a week before our planned submission of the first version of this paper that we were scooped by Spearman and Brown by about 110 years.

As often happens in science, our initial research question had nothing to do with reliability. We initially sought to use factor analysis and other statistical methods to analyse a vast battery of behavioural data, which yielded weaker than expected results. Being aware of the famous GIGO concept (garbage in – garbage out), we started to inspect our input data more closely. We began to ask ourselves if we can trust the measures in our battery – that is, whether they are reliable – and how reliability affects the correlations between them. Once we ran our split-halves reliability analysis and received rather discouraging results (see Fig. 6 in this paper and others^1,2), we asked ourselves a different question – how much data do we need to make the measures reliable? We had already collected some data, but how much more did we need to collect to reach a certain level of reliability? Having looked at all the reliability curves, I started noticing a pattern. They all seemed to follow the same mathematical function. After consulting about these findings with several colleagues, we were fairly certain that such a strikingly simple relationship had to have been described before. However, no one could point us to an actual citation for the phenomenon we were observing. At that time, we approached our colleagues in the Physics Department, asking if they could mathematically derive the formula we found. It was a piece of cake for them, and after running extensive simulations, we were ready to submit our manuscript reporting on what seemed to us to be a ground-breaking discovery. It was then that we scoured the quantitative psychology literature once again, this time with the exact formula in various forms in hand. Fortunately, we came across the so-called ‘Spearman-Brown prophecy’, described independently by Charles Spearman³ and William Brown⁴, which is a formula used in psychometrics to predict the reliability of a test if its length is altered.

This discovery gave us pause. The first shock of having completely missed it during our initial search was replaced by an even bigger shock when we found an entire field formed around this topic that was buried so deep that even current statisticians barely know of it. Once we knew the keywords, it started snowballing. Wonderful textbooks^5,6 written in the '50s and '60s entirely on this topic with details we didn’t even think of. It was a humbling experience to read through these ‘ancient’ tale-like articles and textbooks. It was also a time when the central message of our article seemingly crumbled, we needed to recombobulate and reassess the novelty of our contribution. However painful, this process gave us a new perspective on the problem of reliability and allowed us to appreciate its breadth fully.

In our Communications Psychology article, we introduce a new coefficient (C) that can be estimated from simple population statistics of a given task. This C coefficient can be used to predict the necessary number of trials and allows for direct comparison of tasks in terms of their reliability convergence – hence for their suitability for use in individual differences studies. We then validate the approach on a large dataset containing over a dozen behavioural tasks spanning several cognitive domains. The data provide a springboard for using the C coefficient in individual differences studies to select optimal cognitive tasks and their length.

The ping-pong peer review process highlighted an essential omission in our original draft, which led to an important addition to our article. Until then, we were concerned mainly with split-halves reliability and its convergence within a single session. The predictive formulas from our article allow researchers to predict the number of trials needed to achieve a certain reliability in a single session. During the review process, we added test-retest reliability convergence and the effect of time. This question concerns how stable a given test is over time – how many sessions rather than trials are needed to reliably assess a given cognitive trait and how the time between sessions affects reliability (Fig. 7).

In the end, we aim to promote the concept of reliability to a broad audience of neuroscientists, psychologists, psychophysicists, statisticians, and other researchers. Our goal was to highlight the importance of reliability, make its calculation accessible, and encourage researchers investigating individual differences to think more deeply and proactively about the reliability of their cognitive task measures. This is especially critical to the field of neuroscience, where brain-behaviour relationships are often investigated with little regard to how reliably either the neural or behavioural measures are estimated. Even though our article is neither a comprehensive review nor a formal tutorial, we tried to walk the line between being exact and approachable to anyone. We hope you find this article interesting and valuable, and we wish you a pleasant read. Please also check out our freely available web-based tool (https://jankawis.github.io/reliability-web-app/) for estimating how many trials are needed to achieve a given level of reliability.

References

Hedge, C., Powell, G. & Sumner, P. The reliability paradox: Why robust cognitive tasks do not produce reliable individual differences. Behav. Res. Methods 50, 1166–1186 (2018).
Rey-Mermet, A., Gade, M. & Oberauer, K. Should we stop thinking about inhibition? Searching for individual and age differences in inhibition ability. J. Exp. Psychol. Learn. Mem. Cogn. 44, 501–526 (2018).
Spearman, C. Correlation Calculated from Faulty Data. Br. J. Psychol. 1904-1920 3, 271–295 (1910).
Brown, W. Some Experimental Results in the Correlation of Mental Abilities1. Br. J. Psychol. 1904-1920 3, 296–322 (1910).
Gulliksen, H. Theory of Mental Tests. (Routledge, New York, 1987, first published by Wiley & Sons in 1950). doi:10.4324/9780203052150.
Lord, F. M. & Novick, M. R. Statistical Theories of Mental Test Scores. (IAP, 2008, first published by Addison-Wesley in 1968).

Jan Kadlec

PhD student, Weizmann Institute of Science

Please sign in or register for FREE

If you are a registered user on Research Communities by Springer Nature, please sign in

Follow the Topic

Cognitive Psychology

Humanities and Social Sciences > Behavioral Sciences and Psychology > Cognitive Psychology

Cognitive Neuroscience

Life Sciences > Biological Sciences > Neuroscience > Cognitive Neuroscience

Cognitive Science

Humanities and Social Sciences > Philosophy > Philosophy of Mind > Cognitive Science

Communications Psychology

Communications Psychology

An open-access journal from Nature Portfolio publishing high-quality research, reviews and commentary. The scope of the journal includes all of the psychological sciences.

More about the journal

Related Collections

With Collections, you can get published faster and increase your visibility.

Intensive Longitudinal Designs in Psychology

The Editors at Communications Psychology welcome work that utilizes intensive longitudinal methods, including experience sampling, daily diaries, ecological momentary assessment, and ambulatory assessments, to address psychological research questions.

Publishing Model: Open Access

Deadline: Mar 31, 2026

Explore this Collection

Replication and generalization

This Collection invites submissions of direct replication and generalization studies of primary research papers in psychology.

Publishing Model: Open Access

Deadline: Dec 31, 2026

Explore this Collection

Do people want empathy from AI?

News and Opinion

January Highlights from the Humanities & Social Sciences Communities

Behind the Paper

The Misperception (and Realities) of Asian subgroup representation in STEM

Behind the Paper

Personality intervention affects emotional stability and extraversion similarly in younger and older adults

News and Opinion

Quarterly Highlights from the Humanities and Social Sciences Communities

Cookies

We use cookies to ensure the functionality of our website, to personalize content and advertising, to provide social media features, and to analyze our traffic. If you allow us to do so, we also inform our social media, advertising and analysis partners about your use of our website. You can decide for yourself which categories you want to deny or allow. Please note that based on your settings not all functionalities of the site are available.

Further information can be found in our privacy policy.

Behind the Paper – A measure of reliability convergence to select and optimize cognitive tasks for individual differences research

Share this post

Share with...

...or copy the link