As humans continue to find their way across Earth's landscapes, there has been an increased effort to document life on Earth before species go extinct before ever being discovered. Biodiversity occurrence data is commonly gathered through primary voucher specimens that have physical evidence in a natural history collection, or through direct field observations or sightings that are not traceable to a tangible material and are usually represented by only pictures, videos, or sounds. While both voucher and observation data types are increasingly used either separately or in combination for understanding human impacts on global biodiversity, they each have coverage gaps and biases that may render them inefficient in representing patterns of biodiversity. However, to date, there has been no quantification of the differences in coverage gaps and biases between the two types of data. In this paper, we disentangled and quantified these biases and assessed gaps in coverage of expected biodiversity patterns using 1.9 billion occurrence records of terrestrial plants, butterflies, amphibians, birds, reptiles, and mammals documented by voucher and observation records.
That's a lot of data-- and an ambitious attempt at analyzing it all. Luckily, science is about reaching for new heights in the pursuit of novel insights, something Dr. Barnabas Daru taught me well as an eager undergraduate researcher at Texas A&M University-Corpus Christi. After gaining inspiration at the 2019 Evolution conference in Providence, RI, Daru shared the idea of assessing sampling biases with me. At the time, I was interested in an ongoing specimen digitization effort across regional herbaria, which included making preserved plant specimens available online. So we started there, with plants of North America. Eventually though, after presenting the preliminary research in South Africa in 2020--and as our conversations of biases and gaps in biodiversity data grew--our curiosity led to an initial expansion of the project to a global scale, and a later expansion to include butterflies, amphibians, birds, reptiles, and mammals.
While utilizing biodiversity records to identify coverage of biodiversity patterns has been a priority in recent years, we were motivated by the idea of a global assessment of how this phenomenon is captured by voucher versus observation records across taxa. Towards this end, we focused on quantifying coverage gaps and biases that can manifest 1) geographically in the disproportionate coverage of a species in some regions of its range relative to others; 2) taxonomically in the tendency of some species or lineages to be more or less covered over others; 3) temporally in the unbalanced collecting in some years or some parts of the year; and 4) functional traits in the disproportionate coverage of species on the basis on life history and functional traits, including life cycle, size, growth form and rarity. An example of functional trait bias would be the tendency of an amateur collector, often contributing to observational occurrence data through citizen science efforts, to sample a plant with a showy flower instead of the grass present next to that plant. When multiple collectors make a decision like this, the resulting data would be biased towards plants with showy flowers. Even trained professionals, who often are collectors of voucher records, make decisions such as collecting samples nearby roadsides or an herbarium building, resulting in records from only some regions of a species' range, or a geographical bias for that species.
In our pursuit to learn more, we analyzed the data and found that both voucher and observation records of most taxonomic groups showed massive gaps in taxonomic coverage, but species richness of lineages derived from observation records tended to be more taxonomically biased while vouchers tended to be more taxonomically random and represented expected family richness across taxonomic groups. Additionally, in well-sampled regions such as North America, Western Europe, and Australia, we saw high levels of taxonomic coverage under observations, but vouchers additionally captured expected species richness in areas known to be "biodiversity hotspots", such as South America, South Africa, and Himalaya-Hengduan in Southeast Asia.
When we assessed geographic coverage of species, we found that available records of plants and butterflies showed significant biases under observation records, while amphibians, reptiles, and mammals showed geographic bias towards vouchers. For birds, the exception showed similar biases for voucher and observation records, but also showed higher collection density of observation records, which was anticipated given the charismatic nature of birds and a long history of bird observations. Bird-watching is a widely beloved pastime across many societies and cultures, bringing together individuals of all ages and backgrounds who share a fascination with documenting avian observations.
Across all groups, our analyses showed that voucher records showed significantly higher temporal coverage of species than observation records. However, the high temporal coverage by voucher records was biased towards areas of high collection density, suggesting a long history of recording biodiversity rather than true diversity in these regions.
While observation records had more records per species, voucher records had more even distribution of functional trait coverage, and we found that threatened species were represented by fewer collections on average than non-threatened species for both voucher and observation records. This may be due to their limited abundance and the justifiable restrictions surrounding collections of rare or threatened species. Still, this bias may potentially reduce opportunities to use historical populations to inform current day conservation and restoration efforts.
What our research reveals is that coverage represented by voucher records is relatively more even and reflective of expected biodiversity patterns than observations which tended to be clustered in a few regions that are easily accessible, secure, and relatively influenced by human activities. Vouchers also provide documentary evidence for species identification, reexamination, and supporting material for conclusions reached in a study, but the rate at which they are made available in museums or herbaria has slowed. Instead, the mass production of observation records has overwhelmed vouchers, yet both data types are complementary to each other. Observation records increasingly capture a wide variety of derived information about the existence of an organism, and voucher records strengthen the foundations of current research and support future studies.
As we continue to explore the complexities of life on Earth, voucher and observational data, together, will help us to better understand anthropogenic drivers on biodiversity. Future efforts should be made to address the biases and gaps in coverage to ensure that species occurrences remain vital for ecological and evolutionary research for years to come.