Creating a large database of simulated Raman spectra with optimized computational workflow

Creating a large database of simulated Raman spectra with optimized computational workflow

Raman spectroscopy is a widely used material analysis technique based on the vibrational properties of materials. Raman spectra provide information about the vibrational modes, atomic structure, and chemical composition of materials but spectrographic analysis relies on comparison to known spectra. Hence, experimental databases of spectra have been collected but limited to well-known materials or the materials may contain significant amount of impurities of unknown identity, for instance. 

The spectra can also be simulated using atomistic first-principles methods to complement experimental databases. However, current methods for the simulations of Raman spectra are computationally demanding. Thus, the existing databases of computational Raman spectra contain only a fairly small number of entries. 

In our paper, we present an optimized workflow to calculate the Raman spectra which can reduce the computational cost and takes full advantage of the phonon properties found in existing material databases. Using the workflow, we performed high-throughput calculations for a large set of materials (5099) belonging to many different material classes and collected the results in a database that can be browsed online on the CRD website1.

The computational procedure normally involves two sets of steps: (i) calculating force constants to get the vibrational modes and (ii) calculating the Raman tensors for each mode. Both steps can be computationally demanding for systems with many atoms in the unit cell, which has hindered previous efforts to build such databases in the past. The key advances in our workflow that distinguish our work from the previous ones are the following. 

First, we have decided to build our database on top of the Atsushi Togo’s Phonon database2,3 that contains the calculated full force constant matrix to avoid step one and our work only focuses on calculating the Raman tensors. We are using the same computational parameters, and thus our database is fully consistent with the Phonon database, which is further linked to the Materials project database4 via material-specific IDs. 

Second, to reduce calculation time in the second step and make the workflow more efficient compared to existing methods, Raman-active modes are found based on group theory, and the Raman tensors are calculated only for modes that are known to be active or whose activity could not be determined. Known inactive modes and the three zero-frequency acoustic modes are ignored. For this purpose, symmetrical information about Raman activity was implemented. 

Moreover, before Raman tensor calculations we performed a prescreening that removed materials, without Raman active modes, dynamically and thermodynamically unstable and with too small bandgap. Finally, we have 8382 (84% of 10k materials in Phonon database) materials satisfying these conditions and flagged for calculation. As illustrated in Figure 1 below, the database encompasses a wide variety of materials from different compound classes (oxides, halides, etc.) and of different dimensionality. 

CRD Database statistics
Fig. 1  Database statistics. (a,b) The number of materials in Phonon database as a function of number of atoms in structures and band gap, respectively. (c) Comparison of the number of different types of compounds in Material Project (MP) and Computational Raman Database (CRD). MP* and PhDB* shows the number of structures in Materials Project and Phonon database, respectively, when the same selection conditions as in CRD are applied to them. (f) (d) The number of materials in different space groups as grouped by the crystal system.

We compared the calculated spectra from our approach with experimental results extracted from the RRUFF database5 to validate our method. Figure 2 shows a comparison between calculated spectra and experimental Raman spectra of few selected minerals. Overall good agreement between computational and experimental results is found. 

Comparison with experimental spectra
Fig. 2 Comparison of calculated Raman spectra and experimental spectra from RRUFF database for selected minerals. Green short line segments show the Raman active modes based on the symmetry analysis.

The final database contains Raman tensors, vibrational information (such as phonon eigenmodes, Born charges, and symmetry information) and other relevant information, such as atomic structure, phonon dispersion, and infrared spectrum.  

We hope that the vibrational properties and Raman spectra of materials in the database will prove useful for computational and experimental researchers alike. 

More details can be found in our paper "High-throughput computation of Raman spectra from first principles" published in Scientific Data. 

Link to article: 


  1. Bagheri, M. & Komsa, H.-P., Computational Raman Database. (2023).
  2. Togo, A. Phonon database. (2018).
  3. Togo, A. & Tanaka, I. First principles phonon calculations in materials science. Scr. Mater. 108, 1–5, (2015).
  4. Jain, A. et al. The Materials Project: A materials genome approach to accelerating materials innovation. APL Materials 1, 011002, (2013).
  5. Lafuente, B., Downs, R. T., Yang, H. & Stone, N. The power of databases: The RRUFF project (De Gruyter, 2016).

Please sign in or register for FREE

If you are a registered user on Research Communities by Springer Nature, please sign in

Subscribe to the Topic

Materials Science
Physical Sciences > Materials Science

Related Collections

With collections, you can get published faster and increase your visibility.

Medical imaging data for digital diagnostics

This Collection presents a series of articles describing annotated datasets of medical images and video. All medical specialities are considered and data can be derived from study participants, tissue samples, electronic health records (EHRs) or other sources.

Publishing Model: Open Access

Deadline: Dec 20, 2023

Meteorology and hydroclimate observations and models

This Collection presents a series of articles describing hydroclimate datasets, including data sourced from remote sensing, primary measurements or theoretical models. Datasets are presented without analyses in order to support policy development and further research, with Data Descriptors providing full details of data sources, modelling, and any associated code.

Publishing Model: Open Access

Deadline: Dec 15, 2023