A Simple Way to Reduce Racial Discrimination in Online Ratings

In today’s gig economy, customers’ ratings often reflect subtle racial biases, disproportionately affecting non-White workers. Our study proposes a simple solution: Replacing the common five-star rating system with a binary (thumbs-up/thumbs-down) scale.
A Simple Way to Reduce Racial Discrimination in Online Ratings
Like

Share this post

Choose a social network to share with, or copy the URL to share elsewhere

This is a representation of how your post may appear on social media. The actual post will vary between social networks

The Problem with Five-Star Scales

Traditional five-star scales seemingly allow for nuanced feedback but simultaneously create room for expression of subtle biases. Many individuals hold subtle biases, leading them to rate non-White workers who perform well slightly lower than their White counterparts—likely often without conscious awareness. For example, we find that customers are less likely to give a non-White worker a 5-star rating compared to White workers. Furthermore, online ratings are often skewed and inflated, meaning that a 4-star rating is “low,” even if the customer believes they are providing a positive rating. Over time and with scale, such small differences have big consequences. In the labor platform we analyzed, non-White workers earned 91 cents for every dollar White workers made, due to worker ratings being used to calculate worker compensation.

Changing from a Five-Star Scale to a Thumbs Up/Thumbs Down Scale

We analyzed the impact of a sudden shift from a five-star to a thumbs-up/thumbs-down rating system on a home-services platform. This change, implemented without any announcement, provided an exogenous opportunity to study how this scale change relates to customer ratings. After the rating scale change, we no longer observed racial disparities in evaluations and income.

The dichotomous system, we theorize, forced customers to focus on whether a worker performed well or not, leaving less room for subtle biases to shape evaluations with more fine-grained differentiation. That is, a thumbs up/down rating system clearly communicates to customers what constitutes a “good” versus a “bad” evaluation, leaving less ambiguity for customers.

Extending the Findings

We conducted a series of pre-registered online experiments to extend our main findings from the field. They showed that customers with modern racist beliefs—a reluctance to acknowledge racial inequality—were more likely to give low ratings to non-White workers under the old (five star) system. However, participants using a thumbs up/down scale were less likely to have such personal biases affect their evaluations. When instructed to focus solely on whether a worker’s performance was “good” or “bad,” even those with stronger modern racist beliefs using a five-star scale rated non-White workers more favorably than their counterparts who used a five-star scale without the additional instructions.

Why This Matters and Looking Ahead

The implications of this study go beyond online ratings. Racial inequality in evaluations is pervasive across many sectors, from hiring decisions to student evaluations to startup investment. The simplicity of the binary system makes it an attractive tool for reducing discrimination without costly interventions like bias training. One potential concern with a binary system is that removing granularity may make it harder to discern quality differences at the margin. However, in practice, most rating systems already suffer from a lack of an agreed-upon scale or the ability to accurately distinguish subtle differences in quality. In situations where precise evaluations are not feasible or standardized, a dichotomous scale provides a clear distinction between “good” and “bad” outcomes, which is sufficient in many contexts. Additionally, for cases where more detailed evaluations are necessary, granularity could be introduced through multiple, and well-defined criteria—evaluated on a dichotomous scale—rather than relying on a single omnibus rating, such as a five-star system.

While promising, our study leaves room for further exploration, replication, and validation. For instance, we focused on male workers, so its applicability to women and other intersectional identities needs examination. Additionally, we encourage further study into how similar interventions might reduce discrimination in other settings, like workplaces or educational institutions.

Our research also benefitted from access to large-scale and objective backend data from a labor platform organization—data not often available for research. Such large-scale data—nearly 70,000 observations—affords opportunities to study small effects that matter in a practical sense. Such effects may not always be detected with smaller datasets, particularly in the number of observations more feasible to collect from online panels or laboratory studies. Further, these effects may not always be observable in online panels where social desirability tempers participants’ responses to highly sensitive issues like racism. That is, in many online panels, participants are often aware of being observed, able to discern the purpose of surveys and experiments, and alter their behavior in ways that they would not in real life. In our pre-registered experiments, we did our best to try to circumvent these issues through methodological choices and replication. However, these issues are increasingly pressing given the highlighted salience of race and the economy in public discourse, even in the short time since these data were collected (in 2022). We look forward to additional research on this topic, and to research that can help scholars navigate these methodological difficulties when studying discrimination and the promise of interventions for the real world.

Conclusion

This study highlights the power and importance of the design and structure of evaluation processes in shaping key labor outcomes. By simplifying how evaluations are made, platforms can attenuate differences based on characteristics that are frequently unrelated to quality. Most importantly, changing to a dichotomous scale is easy to implement, and in line with the goal of many evaluation processes: Is the quality good or bad? Such data-driven solutions could make a profound difference in creating more accurate and equitable labor platforms.

 

Please sign in or register for FREE

If you are a registered user on Research Communities by Springer Nature, please sign in

Follow the Topic

Sociology of Organizations and Occupations
Humanities and Social Sciences > Society > Sociology > Sociology of Organizations and Occupations
Work and Organizational Psychology
Humanities and Social Sciences > Behavioral Sciences and Psychology > Work and Organizational Psychology
  • Nature Nature

    A weekly international journal publishing the finest peer-reviewed research in all fields of science and technology on the basis of its originality, importance, interdisciplinary interest, timeliness, accessibility, elegance and surprising conclusions.