Machine learning model matters its accuracy

a comparative study of ensemble learning and automl using heart disease prediction

Published in Bioengineering & Biotechnology

Apr 04, 2025

Phd scholar ,University Teacher, Pokhara Universiry

Like Be the first to like this

Ensemble machine learning leverages multiple models to enhance overall performance by combining weaker individual learners into stronger predictive systems. In recent years, researchers have focused on improving model accuracy for classification and prediction tasks, emphasizing the importance of robust model selection. This study evaluates the performance of various individual machine learning models—including Decision Tree, Logistic Regression, Support Vector Machine (SVM), Random Forest, Artificial Neural Network (ANN), Gaussian Naïve Bayes, K-Nearest Neighbors (KNN), and Multilayer Perceptron (MLP)—by comparing their accuracy, precision, and F1 scores. Majority voting was used as an ensemble aggregation technique to identify the most reliable model for deployment. Additionally, AutoML was implemented to automate model selection and tuning, supporting both binary classification and regression tasks without the need for manual feature engineering. The research compares 18 models: eight default models and ten generated via AutoML. Performance metrics, including accuracy, mean squared error (MSE), and R² score, were assessed using a publicly available heart disease dataset. Among the individual models, SVM, Logistic Regression, and Neural Network achieved up to 80% accuracy, while Gaussian, KNN, and MLP performed slightly lower at 76%. AutoML produced superior results, with the Generalized Linear Model (88%), Gradient Boosting (87%), Distributed Random Forest (87%), and Extra Trees (82%) demonstrating the highest predictive accuracy. This evaluation highlights AutoML’s efficiency in selecting and optimizing models for heart disease classification.

Methods (Rewritten Paragraph):

To determine the most effective model for heart disease classification, this study implemented a range of individual and ensemble machine learning techniques, optimizing them before final selection. The AutoML framework was employed to automate the process of identifying the best-performing model tailored to specific requirements. All models were tested on a common dataset to validate their predictive power, ensuring consistent comparisons. A total of 18 models—eight traditional algorithms and ten AutoML-generated models—were evaluated using ensemble methods such as boosting, bagging, and majority voting. The study utilized a popular open-source heart disease dataset (available at: [https://raw.githubusercontent.com/kb22/Heart-Disease-Prediction/master/dataset.csv]), which contains 303 patient records. The target variable classifies patients based on the presence or absence of heart disease. Since these labels could not be determined by clinical observation alone, machine learning models were trained using a wide range of independent features. These included demographic and clinical attributes such as sex (female = 0, male = 1), age (29–79 years), types of chest pain, resting blood pressure, cholesterol levels, fasting blood sugar, family history of coronary artery disease, and ECG results. Additional attributes included maximum heart rate, exercise-induced angina, ST depression during exercise, the number of vessels observed via fluoroscopy, test duration, and ischemic heart disease indicators. A flowchart (Fig. 1) illustrates the data preparation steps: retrieving the dataset from the internet, preprocessing (including handling missing values via mean imputation), renaming columns, and normalizing only independent variables to prepare the dataset for model training.

Yagyanath Rimal

Phd scholar ,University Teacher, Pokhara Universiry

I'm Yagyanath Rimal, a university teacher specializing in computer science. In addition to teaching C, C++, Java, and web applications at the bachelor's level, I also teach Research Methodology, Data Analytics, and Machine Learning to master's degree students. Currently, I am focusing on research in data analytics and machine learning. I've recently embarked on my journey in academia and am also interested in international collaborations and postdoctoral opportunities.

Follow the Topic

Biomedical Engineering and Bioengineering

Technology and Engineering > Biological and Physical Engineering > Biomedical Engineering and Bioengineering

Multimedia Tools and Applications

Multimedia Tools and Applications

This journal publishes original research articles on multimedia development and system support tools, and case studies of multimedia applications.

More about the journal

Splicing in Image Forgery

Performance evaluation of the classifiers based on features from cotton leaf images

Behind the Paper

An Advanced Method to Predict Personality Traits by Combining Different Types of Information

Cookies

We use cookies to ensure the functionality of our website, to personalize content and advertising, to provide social media features, and to analyze our traffic. If you allow us to do so, we also inform our social media, advertising and analysis partners about your use of our website. You can decide for yourself which categories you want to deny or allow. Please note that based on your settings not all functionalities of the site are available.

Further information can be found in our privacy policy.

Machine learning model matters its accuracy

Share this post

Share with...

...or copy the link