Both engineering and bioinformatics require interdisciplinarity with computer science, data science, and machine learning (ML). Data science deals with the ethics surrounding data, such as the reproducibility of experiments, data consistency, coherence of hypotheses, scientific relevance, and so on. Furthermore, ethical parameters surrounding ML techniques (usability, transferability, generality, interpretability, applicability, adaptability, among others) are considered. A complete definition of selection criteria for characterizing ML techniques can be found in [1].
In bioinformatics, data typically represent the expression of biomolecules, such as quantities of DNA, genes, and proteins in contrasting samples. The usual bioinformatics question is which biomolecules are significantly differentially expressed between groups of samples. Basically, to answer this question, statistical techniques are applied to compare values between two groups, such as Student's t-test or differential expression tests. Then, analyses are performed using unsupervised ML techniques (principal component analysis, clustering, etc.) to investigate the data distribution and group molecules by expression pattern. In this step, biomolecule groups are annotated based on already published databases. Finally, supervised ML techniques are used to select groups of biomarkers to diagnose each sample in one of the compared groups. A variety of methods supervised methods can be applied to select biomarker signatures (k-nearest neighbors, decision tree, regression tree, Bayesian network, linear regression, random forest, artificial neural networks, K-means, etc.).
In engineering, common problems include fault diagnosis and classification, as well as equipment performance prediction. Performance variables are used to assess equipment health. Usually, variables that can be measured in the equipment are monitored, such as flow rate, pressure, temperature, density, viscosity, voltage, etc. These variables are easily measured in the machine, and equations to calculate performance variables from them [2] have already been proposed. In addition to equations, ML techniques have been applied to predict performance variables, which is useful if some of the variables cannot be measured [3]. ML is mainly used to diagnose faults, which has been done with supervised or unsupervised methods [4].
Common to both areas are the tasks of diagnosis. However, the type of data is different. In the case of bioinformatics, we will have quantities of biomolecules, while in machine fault prediction we can have machine measurements (flow rate, temperature, pressure, torque, voltage, etc.). The techniques that can be used to perform analyses in both areas may be the same, since the nature of both is to classify and predict. However, the number of variables in bioinformatics problems can be much greater, as the number of genes can reach 30,000, as in the case of experiments with humans. It can also be taken into account that biological data are more heterogeneous, since samples are never identical. The heterogeneity of the data must be considered in engineering when data are collected from equipment in real operation.
References
[1] Miguel A. De C. Michalski; Carlos A. Murad; Fabio N. Kashiwagi; Gilberto F. M. De Souza; Halley J. B. Da Silva; Hyghor M. Côrtes. A Multi-Criteria Framework for Selecting Machine Learning Techniques for Industrial Fault Prognosis. 2025.
[2] W. Monte Verde, E. Kindermann, J. L. Biazussi, V. Estevam, B. P. Foresti, and A. C. Bannwart. Experimental Investigation of the Effects of Fluid Viscosity on Electrical Submersible Pumps Performance. SPE Prod & Oper 38 (01): 1–19. 2023.
[3] Natan Augusto Vieira Bulgarelli, Jorge Luiz Biazussi, William Monte Verde, Carlos Eduardo Perles, Marcelo Souza de Castro a, Antonio Carlos Bannwart. Experimental investigation on the performance of Electrical Submersible Pump (ESP) operating with unstable water/oil emulsions. Journal of Petroleum Science and Engineering. Volume 197, February 2021, 107900.
[4] Junqian Zhang, Shuaishuai Dong, Shengyu Zhang, Heng Zhang, Hongli Li, Qingfeng Dong, Pin Wu and Chun Feng. Review on Fault Diagnosis of Electric Submersible Pump using Machine Learning. Journal of Physics: Conference Series. (2025).