A global patent dataset of bioeconomy-related inventions

The bioeconomy promises solutions to global challenges like climate change and resource depletion. Tracking its growth is complex, spanning sectors and technologies. Using AI, we identified 5.6 million bioeconomy patents, revealing trends and opportunities for sustainable innovation.
A global patent dataset of bioeconomy-related inventions
Like

Share this post

Choose a social network to share with, or copy the URL to share elsewhere

This is a representation of how your post may appear on social media. The actual post will vary between social networks

The challenge of identifying bioeconomy-related patents

The bioeconomy offers transformative solutions to global challenges such as climate change, resource depletion, and environmental degradation. The transition to a bio-based economy, however, requires the development and diffusion of (technological) innovations. Tracking innovation in the bioeconomy is challenging due to its multidimensional and cross-sectoral nature. To address this problem, we developed a comprehensive dataset of patents related to the bioeconomy, leveraging artificial intelligence (AI). Patents are a common indicator for knowledge development and innovation. Traditional methods for identifying bioeconomy-related patents have significant limitations. Static technology classifications and keyword searches in patent abstracts, while widely used, are prone to inaccuracies. For instance, bio-based innovations may be misclassified under unrelated categories or overlooked entirely due to variations in terminology across languages and disciplines. These shortcomings in traditional methodologies highlighted the need for a more adaptable and dynamic approach to accurately reflect the evolving nature of the bioeconomy.

Leveraging AI for tracking the bioeconomy in patent data

To overcome these challenges, we fine-tuned a pre-trained large language model (LLM) using manually annotated patent abstracts. This model was designed to identify bio-based products, services, and processes with greater accuracy and comprehensiveness. We analyzed a dataset of 67 million patents and successfully identified 5.6 million as bioeconomy-related. This approach transcended the limitations of traditional methods by accommodating linguistic and contextual variations in patent descriptions.

Mapping bioeconomy-related inventions

To map innovation within the bioeconomy, we applied topic modeling, a technique that groups text data into thematic clusters. This analysis revealed key areas of innovation, including organic farming, water purification techniques, fermentation methods, biodegradable materials or sustainable feed solutions. These themes were visualized through a detailed map of bioeconomy advancements, showcasing thematic clusters and their interconnections. This visualization enables researchers and policymakers to explore inventions in the bioeconomy and identify underexplored opportunities.

Lessons learned: Harnessing AI for innovation research

This project showcased the immense potential of AI models, particularly large language models (LLMs), in extracting meaningful information from unstructured data like patent abstracts. By overcoming the constraints of static classifications and keyword-based methods, fine-tuned LLMs provided a more nuanced and comprehensive view of innovation patterns in the bioeconomy. We encourage fellow researchers to explore these new techniques, in particular in economics and the social sciences.

Implications for policy and research

Our dataset opens new possibilities for policymakers and researchers: Policymakers can use these insights to design targeted strategies that foster bio-based innovation. For example, identifying regions or industries with high innovation potential can inform funding priorities and regulatory support. Moreover, the dataset provides a foundation for exploring trends in bioeconomy innovations, assessing the impact of policies, and understanding the evolution of innovation systems.

Please sign in or register for FREE

If you are a registered user on Research Communities by Springer Nature, please sign in

Follow the Topic

Sustainability
Physical Sciences > Earth and Environmental Sciences > Environmental Sciences > Sustainability
Artificial Intelligence
Mathematics and Computing > Computer Science > Artificial Intelligence
Patenting
Humanities and Social Sciences > Business and Management > Innovation and Technology Management > Technology Commercialization > Patenting
Data Analysis and Big Data
Mathematics and Computing > Statistics > Data Analysis and Big Data
Data Mining and Knowledge Discovery
Humanities and Social Sciences > Society > Science and Technology Studies > Information and Communication Technologies (ICT) > Data Science > Data Mining and Knowledge Discovery

Related Collections

With collections, you can get published faster and increase your visibility.

Epidemiological data

This Collection presents a series of articles describing epidemiological datasets spanning diverse populations, ecosystems, and disease contexts. Data are presented without hypotheses or significant analyses, and can be derived from population surveys, health registries, electronic health records, field sampling, or other sources.

Publishing Model: Open Access

Deadline: Mar 27, 2025

Data for epigenetics research

This Collection presents data within epigenetics research including, but not limited to, data generated through techniques such as ChIP, bisulphite, nanopore and RNA sequencing, single-cell epigenetics/epigenomics, spatial genomics/epigenomics, and the role of non-coding RNAs in epigenetic modulation.

Publishing Model: Open Access

Deadline: Mar 28, 2025