MDS-Onto: A Community-Driven Effort to Standardize Terminologies in Materials and Data Sciences
Published in Research Data, Mathematics, and Statistics

Scientific data is often messy. Poor documentation practices, lack of data management policies, non-machine-readable datasets and isolated user-created variables create reproducibility, transparency, and efficiency challenges in research. When data is not properly documented, lacks essential metadata or doesn’t follow community standards, exchanging and sharing information is hard and leads to slow and unreliable progress.
In this work, we introduce the Materials Data Science Ontology (MDS-Onto) framework, a community-driven initiative composed of an ensemble of user-friendly tools for developing the MDS-Onto ontology (Figure 1) for data FAIRification and semantic reasoning.
‘Ontologies’ are standards that share vocabulary and relationships across domains, thereby facilitating and enhancing data exchange and interoperability of datasets, data analysis, and models trained on data. This definition also poses a question - if data is usually domain specific, how do we create interoperable terms? By utilizing International Standards Organization (ISO) established ontologies, terms are aligned to existing standards while labeling is consistent, enabling data science workflows to be streamlined and efficient .
MDS-Onto is a community effort with collaborations and partnerships with industry, academia, and national laboratories. Our core development team is located at the SDLE Research Center in the Department of Materials Science and Engineering at Case Western Reserve University and our network of domain experts represents a breadth of institutions.
Materials Data Science Ontology ( MDS-Onto)
MDS-Onto is a low-level and modular ontology for the domains of Materials and Data Science. Our approach to modularize MDS was created to simplify the process of terms alignment, which can be challenging depending on the alignment level and the user’s experience in ontology development. Creating modular ontologies simply means that we map our terms to MDS-Onto Concepts that were previously mapped to other mid-level ontologies, such as PMDco. If one wants to map the instrument model variable, for instance, they can map the model to mds-tool (concept layer), which is a subclass of pmd:ProcessingNode from PMDco.
We recommend variables at the sub-domain level to be created following Research Data Alliance (RDA) recommendations for the domain or application field. When domains donot fit into an existing MDS-Onto-Concept category, additional MDS-Onto Concepts can be created and domain or sub-domain ontologies incorporated into MDS-Onto. The MDS-Onto core development teams then map the new ontology to existing interoperable mid/top-level ontologies.
.png)
MDS-Onto Tools: MDS-Onto FindTheDocs, FAIRmaterials, and FAIRLinked
Our MDS-Onto Framework has 3 main components in addition to the MDS-Onto Ontology: FAIRmaterials, a bilingual (R/Python) software package used for ontology creation, visualization, and documentation using a simple interface based on a .csv template. Users populate the .csv file with domain/subdomain terms and map these directly to mds: (or to mid-level ontologies), run FAIRmaterias, and it generates ontology files (.ttl, .owl), an image for visualization, and an .html webpage of ontology documentation. The second component of the MDS-Onto framework is FAIRlinked, a Python package that uses MDS-Onto, translates .csv data into FAIRified .jsonld linked data. While MDS-Onto FindTheDocs, the third component of our framework, is a website for ontology visualization using the WebVOWL graph exploration tool and JSON-LD Playground for .jsonld validation and full MDS-Onto documentation. MDS-Onto FindTheDocs is also where users can download the up-to-date MDS-Onto Ontology files. A snapshot of MDS-Onto FindTheDocs can be seen in Figure 2. Figure 3 illustrates how FAIRlinked uses MDS-Onto Ontology and raw data to create .jsonld linked data.

We created ontologies, so what?
Now we have several domain and sub-domain ontologies that describe unified knowledge and vocabulary in particular domains as terms and relationships, as illustrated in Figure 1. How can we make use of those ontologies beyond being tools for terminology guidance? How can we integrate ontologies to guide FAIR data creation and automated scientific analysis workflows?
The answer is FAIRlinked, our most recent package that was briefly introduced in the previous section .FAIRlinked was designed to fill the gaps between ontology development and FAIR principles implementation. The basic approach of FAIRlinked is to take ontology files with interoperable terms and relationships from MDS-Onto, creating templates to be populated with raw data. These are then serialized in a second interaction to create JSON-LD files. JSON-LD is a standard data format that is a W3C recommendation for linked data. By using the RDF data cube vocabulary in the measured dimension approach, users can decide on creating JSON-LD for entire dataframes as a single instance or creating one JSON-LD file per row. The choice will depend on how the study object and domain are organized and what makes the most sense for that particular domain.

FAIRlinked creates the JSON-LD files with parseable filenames that are globally uniquely identified. The parseable filenames convention and order will depend on community preference, standards, and relevance for that domain. All metadata is stored as a key in the .jsonld files, so in theory, we do not need metadata information in the file name. However, to meet the unique identifier requirement in the Findable principle, the file name should use hashes or Universally Unique Identifiers (uuids). Such file names would resemble 24d470987fda1278c63c3j78jb30869b8218c64f.jsonld – not very user friendly or easily interpretable by a human reader.
An alternative way to meet the “Findable” principle of FAIR is by designing more human-friendly parseable file names, which is what we choose to adopt by defining our parseable file names starting with the researcher's Open Researcher and Contributor IDentification (ORCID). For Photovoltaics modules, for example, where the study object is the module id, the file name convention adopted is orcid-sampleID-timestamp.json.
Once we have data and metadata all stored in .jsonld linked datafiles, which are consistent throughout a domain, it becomes easier, quicker, and more efficient to write scripts and establish workflows that can be reused, to extract, analyse, and model information.
Follow the Topic
-
Scientific Data
A peer-reviewed, open-access journal for descriptions of datasets, and research that advances the sharing and reuse of scientific data.
Related Collections
With collections, you can get published faster and increase your visibility.
Clinical informatics
Publishing Model: Open Access
Deadline: Sep 19, 2025
Text and speech corpora for natural language processing and corpus linguistics
Publishing Model: Open Access
Deadline: Jul 24, 2025
Please sign in or register for FREE
If you are a registered user on Research Communities by Springer Nature, please sign in