Behind the Paper

Analyzing Diversity: Citation Diversity Reports made easy

In this behind-the-paper blog post, I provide background and context on our recent publication in Nature Machine Intelligence, that reports on the ability of large language models to generate publication-ready Citation Diversity Reports (aka Citation Diversity Statements). Give it a try, it's easy!

It is well documented that women and minority authors are cited less frequently than male and majority authors due to systemic biases, especially when they are in the first and last author position. To address this, some scientific journals have added an optional “Citation Diversity Statement” that authors can include at the end of their publications that (1) acknowledge this trend; (2) attempt to quantitatively analyze the demographics of their bibliographies; and (3) summarize what steps they have taken (if any) to try and diversify their cited references. Authors are typically invited to publish either a short version that only regards point (1), or longer versions that include points (2) and/or (3). I first became aware of citation bias and the concept of the Citation Diversity Statement (CDS) from a paper by Zurn et al.[1]

In 2021 I, and fellow Editors-in-Chief of the Biomedical Engineering Society (BMES) journals, along with the BMES Publications Board and Springer editors, introduced the concept of an optional Citation Diversity Statement in the BMES family of journals: Annals of Biomedical Engineering, Biomedical Engineering Education, Cardiovascular Engineering and Technology, and Cellular and Molecular Bioengineering[2]. Unfortunately the practice has not been widely adopted, and only a small fraction of authors have opted to include a CDS in their published BMES articles to date. While most academics support diversity, equity, and inclusion in science, perhaps more so than the lay public, part of the slow pace of CDS adoption is likely due to the lack of effective tools for quantifying the diversity of cited authors. When we published our 2021 editorial, the tools available to estimate race, ethnicity and gender from author lists tended to be inaccurate, incomplete, and not user-friendly.

At the end of 2022, ChatGPT and other large language models (LLMs) become widely available, and quickly took the world by storm. I was swept up in this storm, and became a heavy user of these tools, especially fascinated by the intersection of LLMs with higher education and scientific publishing. In one of my earliest publications on the topic, that received a surprising amount of attention, I prompted ChatGPT to "Create a list of references on chatbots, AI, and plagiarism, while trying to cite more women authors and people of color to make up for historical biases in scientific citation."[3] Interestingly, but likely to no one's surprise at this current moment in time, ChatGPT generated a list of 5 very plausible looking references which I determined were fake or "hallucinated" and I reported this observation in the paper. At the time of publication (January 2, 2023), this was no doubt one the first reports of hallucinated scientific references published in a scientific journal. Once LLM tools linked to internet search became widely available, such as Microsoft Bing chatbot or the now defunct Google Bard, I thought this might finally equip LLMs with the ability to search the internet for indications of author demographics, and locate real (not hallucinated) scientific references. I reported my attempts to employ Google Bard to produce reasonable citation diversity statements in a follow-up publication, but alas Bard was still not up to the challenge[4]

Since those heady days of 2023, we have witnessed a remarkable advance in the power and capabilities of large language models, including the emergence of Deep Research, Google Gemini, DeepSeek, and GPT-5, among other breakthroughs. This latest generation of LLMs can "think through" complex logical problems, act as agents that repeatedly query online resources, and have gained respect as legitimate tools for scientific research (albeit, not as credited coauthors). My coauthor, Melissa Cantú, and I decided it was time to give the latest generation of LLMs another try at generating accurate, publication-ready Citation Diversity Reports (CDR; aka, CDS).

As we report in our recent publication in Nature Machine Intelligence, we tested 27 different LLMs from 7 companies for their ability to analyze and estimate the gender, race, and ethnicity of a collection of cited authors, compile those statistics in a useful form, and then generate a concise report suitable for publication in journals that allow for the inclusion of a CDR/CDS[5]. For our tests, we formed a list of 195 coauthors of 36 journal papers, excluding editorials and commentary articles, that comprise the publications of my lab over the past 5 years, where the demographic information is conveniently known as "ground truth". Remarkably, many of the LLMs tested could indeed complete the task with greater than 95% accuracy, with most of the highest performing LLMs available free of charge. This is a marked improvement over existing tools for citation diversity analysis which require a paid monthly license, such as Gender API by cleanBib. The highest performing LLM of all, for the generation of CDRs, was Claude 3.7 Sonnet. 

Many interesting observations are also reported in our latest publication, including some LLM actions which veer into the territory of accessing information that could be viewed as an invasion of privacy, for instance accessing online university directories, faculty webpages, and even social media posts in an attempt to deduce the demographic characteristics of an author. However, we note that all of this information is already readily available on the internet, and so could also be easily obtained by a human researcher if she so wished, and our preferred tools did not engage in such problematic actions. To summarize, the age of easy, free, and accurate publication-ready Citation Diversity Reports has arrived, thanks to the latest generation of LLMs such as Claude 3.7 Sonnet and Grok 3 + DeeperSearch. So what are you waiting for? I encourage you to explore the diversity of your cited authors using the simple prompts we provide in our Nature Machine Intelligence paper. It's interesting, it represents the first step towards correcting citation bias, and once you know... knowing is half the battle!

References:

[1] Zurn, P., D. S. Bassett, and N. C. Rust. The citation diversity statement: a practice of transparency, a way of life. Trends Cogn. Sci. 24:669–672, 2020.

[2] Rowson, B., Duma, S.M., King, M.R. et al. Citation Diversity Statement in BMES Journals. Ann Biomed Eng 49, 947–949 (2021).

[3] King, M.R., chatGPT. A Conversation on Artificial Intelligence, Chatbots, and Plagiarism in Higher Education. Cel. Mol. Bioeng.16, 1–2 (2023).

[4] King, M.R. Can Bard, Google’s Experimental Chatbot Based on the LaMDA Large Language Model, Help to Analyze the Gender and Racial Diversity of Authors in Your Cited Scientific References?. Cel. Mol. Bioeng. 16, 175–179 (2023).

[5] Cantú, M.S., King, M.R. LLMs as all-in-one tools to easily generate publication-ready citation diversity reports. Nat Mach Intell (2025). https://doi.org/10.1038/s42256-025-01101-y