Massively parallel data processing for new discoveries

Published in Chemistry, Materials, and Physics
Massively parallel data processing for new discoveries

Share this post

Choose a social network to share with, or copy the shortened URL to share elsewhere

This is a representation of how your post may appear on social media. The actual post will vary between social networks

When dealing with large datasets, a favorite approach has been to automate data analysis and interpretation using Artificial Intelligence (AI) or Machine Learning (ML). These methods, however, are only as good as their training set, which is assembled from previously-known, established data. Thus, they can never make new discoveries or even conclude that previous knowledge is not good enough and a new discovery is necessary for data processing and interpretation.

The best, and most creative way to circumvent this problem and accelerate the pace of discovery is to task diverse people with decision-making. People are much better suited than AI to make decisions and assess the results’ quality. The more diverse the group of people the better.

For a recent publication in Nature Communications (1), I hired a group of 15 undergraduate students at the University of Wisconsin-Madison to do massively parallel processing and re-processing of data acquired over 8 years. These students, called “the Cnidarians” from the phylum that includes corals, anemones, and jellyfish, are mostly from minorities underrepresented in science from the Mercile J. Lee Scholars Program at UW-Madison (2). I assembled the Cnidarians during the pandemic and continued to employ them and benefit from their careful and highly skilled work even since, producing all the new data presented in the paper (1). The Cnidarians start processing as soon as the spectromicroscopy data are acquired during beamtime at the Advanced Light Source (PEEM-3 microscope, ALS beamline (3-11)).

The key aspect and main scientific advantage of data processing done by the Cnidarians is that they do massively parallel data processing. Multiple Cnidarians process precisely the same data in parallel. Their data processing includes 11 distinct decisions to make. Ten decisions generate too large a parameter space to explore every possibility in a reasonable computing time, thus, people are better than computers. Because diverse people make decisions differently, the parameter space explored by a diverse group is much larger than what a group of white, male, affluent students would probe. Then, at weekly meetings on Zoom with all the Cnidarians, we look at the Myriad Maps (MMs) resulting from the work of each Cnidarian, and compare the results obtained by different people on the same data. If multiple Cnidarians have converged to the same solution, that solution must be the most robust, if not we see which decisions were made differently and discuss their reasoning, and why it led to different final MMs. We (the Cnidarians, really!) discovered many ways to improve data processing. Diversity is our strength. The accuracy, precision, and overall quality of the resulting MMs is outstanding and unprecedented. Here we used our custom data processing software, which we release free of charge on (12).  The idea of massively parallel data processing by a diverse group, however, is generalizable. It can be exported to any other method or software, with any degree of complexity in data processing, and involving any kind of decision making.

New discovery

The Cnidarians’ massively parallel data processing revealed that there were two new, unknown mineral phases on the forming surface of coral skeletons and abalone nacre, and this led to the discovery of a new natural mineral, calcium carbonate hemihydrate (CCHH). This means that there aren’t only liquid and amorphous precursors to aragonite formation in biominerals, there are also crystalline precursors, albeit metastable ones. See related blog Myriad Mapping (MM) from atomic to intergalactic scales.

  1. Schmidt CA, et al. (2024) Myriad Mapping of nanoscale minerals reveals calcium carbonate hemihydrate in forming nacre and coral biominerals. Nat Commun 15(1):1812.
  2. MJLeeProgram (2024).
  3. Gong YUT, et al. (2012) Phase transitions in biogenic amorphous calcium carbonate. Procs Natl Acad Sci 109:6088-6093.
  4. DeVol RT, et al. (2015) Nanoscale transforming mineral phases in fresh nacre. J Am Chem Soc 137(41):13325-13333.
  5. Mass T, et al. (2017) Amorphous calcium carbonate particles form coral skeletons. Procs Natl Acad Sci 114(37):E7670-E7678.
  6. Sun C-Y, et al. (2020) From particle attachment to space-filling coral skeletons Procs Natl Acad Sci 117(48):30159-30170.
  7. Najman M, Kasrai M, Bancroft G, Frazer B, & Stasio GD (2004) The correlation of microchemical properties to antiwear (AW) performance in ashless thiophosphate oil additives. Tribology Letters 17:811-822.

  1. the image shows precursors on the surface, not in the bulk, of forming biominerals

    Metastable precursors to coral skeleton and nacre formation. These myriad maps (MMs) show that amorphous and crystalline precursors are both present, but only on the forming surface of fresh, forming aragonite (CaCO3) biominerals: coral skeleton (A1, A2, A3) and abalone nacre (B1, B2, B3). Legend: red is amorphous calcium carbonate hydrated (ACC-H2O), green is amorphous calcium carbonate dehydrated (ACC), cyan is calcium carbonate hemihydrate (CCHH), and magenta is monohydrocalcite (MHC).

Please sign in or register for FREE

If you are a registered user on Research Communities by Springer Nature, please sign in

Follow the Topic

Research Data
Research Communities > Community > Research Data
Condensed Matter Physics
Physical Sciences > Physics and Astronomy > Condensed Matter Physics
Analytical Chemistry
Physical Sciences > Chemistry > Analytical Chemistry
Phase Transitions and Multiphase Systems
Physical Sciences > Physics and Astronomy > Condensed Matter Physics > Phase Transitions and Multiphase Systems
Physical Sciences > Materials Science > Biomaterials > Biomineralization
Materials Characterization Technique
Physical Sciences > Materials Science > Materials Characterization Technique

Related Collections

With collections, you can get published faster and increase your visibility.

Cancer and aging

This cross-journal Collection invites original research that explicitly explores the role of aging in cancer and vice versa, from the bench to the bedside.

Publishing Model: Hybrid

Deadline: Jul 31, 2024

Applied Sciences

This collection highlights research and commentary in applied science. The range of topics is large, spanning all scientific disciplines, with the unifying factor being the goal to turn scientific knowledge into positive benefits for society.

Publishing Model: Open Access

Deadline: Ongoing