Rummagene brings to the surface a massive collection of published biomedical datasets that are currently buried and inaccessible

Rummagene is a biomedical research search engine providing access to hundreds of thousands of mammalian gene sets mined from the supporting materials of research publications listed on PubMed Central. Rummagene is freely and openly available at:

Share this post

Choose a social network to share with, or copy the shortened URL to share elsewhere

This is a representation of how your post may appear on social media. The actual post will vary between social networks

Although basic biomedical research is advancing rapidly, communicating results from research studies remains almost the same. Research results are still reported in manuscripts following the same structure of physically printed research papers with sections that describe the background and motivation, methods, results, discussion and conclusions. Published research manuscripts also contain citations to prior work, and static figures and tables that display the results with figure and table legends. This approach to communicate results from biomedical research projects is effective. It enables the biomedical research community to build upon the body of human knowledge new layers of understanding. Every new research article is a building block, a "brick" in the house of human collective knowledge. Some bricks cost more to produce, and some are more valuable to the body of knowledge. Specifically, the publications that introduce groundbreaking  discoveries, present a new idea or a theory that can explains many observations, or offer a path to an innovative technology, are the most valuable. At the same time, search engines such as PubMed, PubMed Central, Web of Science, or Google Scholar are critical in facilitating the rapid identification and immediate access to the growing corpus of published biomedical research manuscripts. These search engines are enabling the community of biomedical researchers to sift through the body of knowledge, ensuring that new work builds upon the most relevant existing knowledge. To enable such search engines, the databases behind the search engines use bots to index the text of published research papers. However, most research publications also include supplementary data tables, and those are currently skipped by the indexing bots. These tables are not easily indexable for search because of the non-uniform or non-existing standards required by journals for authors to annotate, store, and share these additional important datasets.

In the past 30 years, pharmacological, genetic, molecular, cellular biomedical research was mainly focused on studying the function of single genes and proteins. However, as omics technologies increase in availability, accessibility, diversity, and accuracy, biomedical research is shifting toward studying gene and protein modules. These modules are not commonly well-defined. They can represent different things, for example, pathways, macro-molecular complexes, targets of transcription factors, differentially expressed genes, sets of genes that were identified to be related to a phenotype such as a human disease or a knockout mouse phenotype, and more. These gene-set modules differ in size; small modules can be part of larger modules; and genes can be members of multiple modules. Regardless of such complexity, abstracting biomedical knowledge into gene sets provides a unifying strategy to mine, index, and reuse information at the data level. For many published biomedical research studies, such gene-set-knowledge does not fit well within the confines of a standard research article, and as a result, the discovered gene sets are reported in the supplementary materials of such publications.

Rummagene is a new biomedical research search engine that provides access to the hundreds of thousands of such mammalian gene sets. Updated weekly, the Rummagene indexing bot and search engine bring to the surface an ability to search a massive collection of published biomedical datasets that are currently buried and inaccessible. In addition, the collection of gene sets mined by Rummagene is a valuable dataset on its own. By examining gene-gene co-occurrence in Rummage mined gene sets, gene function can be predicted, and novel gene modules can be discovered. Rummagene which is freely and openly available at: can assist researchers form novel hypotheses and help explain the collective function of discovered gene sets, adding another dimension to standard gene set enrichment analysis.

Please sign in or register for FREE

If you are a registered user on Research Communities by Springer Nature, please sign in

Subscribe to the Topic

Life Sciences > Biological Sciences > Biological Techniques > Computational and Systems Biology > Bioinformatics
Data Mining and Knowledge Discovery
Mathematics and Computing > Computer Science > Database Management System > Data Mining and Knowledge Discovery
Metabolic Pathways
Physical Sciences > Chemistry > Biological Chemistry > Metabolism > Metabolic Pathways
Genetics and Genomics
Life Sciences > Biological Sciences > Genetics and Genomics
Systems Biology
Life Sciences > Biological Sciences > Biological Techniques > Biological Models > Systems Biology

Related Collections

With collections, you can get published faster and increase your visibility.

Biology of reproduction

For this Collection, we encourage submissions that push forward our understanding of reproduction and its impact on offspring in both model organisms and human studies.

Publishing Model: Open Access

Deadline: Jul 10, 2024