Results and Implications for Generative AI in a Large Introductory Biomedical and Health Informatics Course

We compared student hoework and final exam scores with 6 large-language model (LLM) systems in a large online introductory course in biomedical and health informatics. All of the LLMs scored between the 50th and 75th percentiles of students, raising questions about student assessment in higher ed.
Like

Share this post

Choose a social network to share with, or copy the URL to share elsewhere

This is a representation of how your post may appear on social media. The actual post will vary between social networks

Generative artificial intelligence (AI), driven by large language models (LLMs), has had a profound impact in all scientific disciplines, including biomedicine. Accomplishments in the latter include passing medical board exams, solving clinical cases, drafting empathetic notes to patients, and proposing new drugs for treatment of diseases. Those of us who are educators have new challenges from generative AI, which is based on their ability to perform as well as students in a variety of learning assessments. One researcher, business professor Ethan Mollick from the University of Pennsylvania, has called this the “homework apocalypse.”

I decided to put this notion to the test in a large online introductory course that I teach in my field of biomedical and health informatics. The course is offered at the graduate, continuing education, and medical student levels. The curriculum for these offerings is essentially identical, with the course updated annually. Teaching occurs mainly via voice-over-Powerpoint lectures and threaded discussion forums, with assessment taking place via multiple-choice questions (MCQs; 10 per unit for each of the 10 units of the course) and a 33-question, short-answer final exam. Some instances of the course require a term paper and a few make use of flipped classrooms, virtual or in-person, for faculty-student discussion.

As in many fields, AI has become an increasing focus of the course. This naturally led to the question of how LLMs would perform on the assessments in the course. I put this to the test by submitting the assessments of last year’s (2023) version of the course. I prompted six high-profile commercial LLMs with the MCQs and final exam for the course. I used the LLMs as students would likely use them, i.e., through their Web interfaces. The LLMs included ChatGPT, Bing CoPilot, Google Gemini, Meta Llama, Claude, and Mistral.

Sure enough, the LLMs scored better than 50-75% of the students. Of the 139 students who completed the course in 2023, the best LLMs scored around the 75th percentile of all students. All of the LLMs did about the same, with Google Gemini scoring the best, but ChatGPT, Meta’s Llama, and the others scoring not too far behind. Another interesting finding, not surprisingly really, was that the LLMs completed the assessments, from the MCQs to the final exam, in about one minute’s time.

The implication for these results is that generative AI systems challenge our ability to assess student learning. This will require us to make modifications in how we evaluate students. This does not mean we should ban such tools, but that we need to find ways to ensure enough learning so students can think critically from a core of fundamental knowledge.

While the first thought the comes to mind with LLMs performing capably on student assignments and assessments is “cheating,” there are many larger issues of concern. In any academic discipline, is there a core of knowledge about which students should be able to answer questions without digital assistance? Does this core of knowledge facilitate higher-order thinking about a discipline? Does it enable thoughtful searching for information, via classic search or LLMs, for information beyond the human’s memory store?

The paper can be found at https://doi.org/10.1038/s41746-024-01251-0.

Hersh W, Fultz Hollis K. Results and implications for generative AI in a large introductory biomedical and health informatics course. NPJ Digit Med. 2024 Sep 13;7(1):247. doi: 10.1038/s41746-024-01251-0. PMID: 39271955.

Please sign in or register for FREE

If you are a registered user on Research Communities by Springer Nature, please sign in

Follow the Topic

Health Informatics
Life Sciences > Health Sciences > Health Care > Health Informatics
Artificial Intelligence
Mathematics and Computing > Computer Science > Artificial Intelligence
  • npj Digital Medicine npj Digital Medicine

    An online open-access journal dedicated to publishing research in all aspects of digital medicine, including the clinical application and implementation of digital and mobile technologies, virtual healthcare, and novel applications of artificial intelligence and informatics.

Related Collections

With collections, you can get published faster and increase your visibility.

Applications of Artificial Intelligence in Cancer

In this cross-journal collection between Nature Communications, npj Digital Medicine, npj Precision Oncology, Communications Medicine, Communications Biology, and Scientific Reports, we invite submissions with a focus on artificial intelligence in cancer.

Publishing Model: Open Access

Deadline: Dec 30, 2024

Progress towards the Sustainable Development Goals

The year 2023 marks the mid-point of the 15-year period envisaged to achieve the Sustainable Development Goals, targets for global development adopted in September 2015 by all United Nations Member States.

Publishing Model: Hybrid

Deadline: Ongoing