Citizen science and traditional research are allies
Published in Ecology & Evolution, Protocols & Methods, and Computational Sciences
Citizen science and traditional university research are often perceived and portrayed as opposites. In reality, however, they have much more in common than they have differences. Ultimately, both citizen science projects and traditional professional research are subject to the same quality criteria of good scientific practice, such as objectivity, correctness or accuracy, transparency, verifiability, integrity, and relevance.
In fact, citizen science projects and traditional research are in fact increasingly supporting and enriching each other. Current research results from the Viel-Falter Monitoring project illustrate this clearly. A neural network was trained to recognise butterfly species using over 500,000 butterfly photos collected by over 25,000 volunteers as part of a broad-based citizen science project, a neural network was trained to recognize butterfly species and the results were critically evaluated. This was done using one of the most powerful computers in Europe. The dataset, scripts, and models used are now available to the interested public.
Automated identification of species using machine learning models (so-called artificial intelligence) is familiar to users of apps such as iNaturalist, Flora Inkognita, and many others. However, few people consider what is required for the successful development and subsequent application of such identification tools. Many are also unaware of the amount of training data and computing power required to develop effective identification models.

Friederike Barkmann, a doctoral student in the Austrian Butterfly Monitoring Viel-Falter, addressed these questions in her second Master's thesis in the field of data science. She also critically evaluated the accuracy of identification for each species. To achieve this, she used a very large data set compiled by volunteers have compiled over the last ten years as part of the Butterflies of Austria project run by the Billa Foundation Blühendes Österreich. Based on over 500,000 images, a neural network was trained to recognise butterfly species. Training such models requires not only a lot of data but also a great deal of computing power, so access to powerful computers was essential. The high-performance computer at the University of Innsbruck, LEO5, initially served this purpose well. However, as the model runs took several hours even on this supercomputer, the process was first optimized with the help of supercomputing expert Andreas Lindner from EuroCC Austria through parallelization. This means that several processors (GPUs) are connected to each other to solve a computing task.
Ultimately, the EuroCC Austria project ultimately provided access to the LEONARDO supercomputer - one of the most powerful in Europe - and supported the implementation with expertise in the field of high-performance computing. This enabled the first models to be trained that could correctly identify 97% of all images. This high level of identification accuracy demonstrates that such models are well suited to providing app users with feedback on their observations. Accuracy can also be increased by removing images with uncertain identifications. These images could then be re-identified by experts, for example. This approach could save considerable time in re-identification and quality-controlling of citizen science data. At the same time, it ensures high data quality. It has also been documented that some species are easier to identify than others. Species groups that can be challenging even for experts, such as the family of skippers and the genus Erebia, are also more difficult for the computer model to identify.
The dataset, which includes butterfly photographs, computer scripts, and models, was published as part of a data paper. In the spirit of open (citizen) science, the dataset has been made available to the general public. It is significantly larger than those datasets used in similar studies to date. It is a valuable resource for further research and can contribute to further improving identification algorithms such as those used in iNaturalist among others.
This closes the circle: citizen science initiatives support and expand scientific research, which in turn develops methods and techniques that further expand the possibilities of citizen science. However, this cross-fertilization is only possible if the various stakeholders collaborate and solve problems together. Given the global biodiversity and climate crises, there are plenty of problems to solve.
Barkmann, F., Lindner, A., Würflinger, R., Höttinger, H., Rüdisser, J. (2025) Machine learning training data: over 500,000 images of butterflies and moths (Lepidoptera) with species labels. Sci Data 12 (1), 1369. https://doi.org/10.1038/s41597-025-05708-z
Follow the Topic
-
Scientific Data
A peer-reviewed, open-access journal for descriptions of datasets, and research that advances the sharing and reuse of scientific data.
Related Collections
With Collections, you can get published faster and increase your visibility.
Data for crop management
Publishing Model: Open Access
Deadline: Jan 17, 2026
Computed Tomography (CT) Datasets
Publishing Model: Open Access
Deadline: Feb 21, 2026
Please sign in or register for FREE
If you are a registered user on Research Communities by Springer Nature, please sign in