Machine learning model matters its accuracy

a comparative study of ensemble learning and automl using heart disease prediction
Like

Share this post

Choose a social network to share with, or copy the URL to share elsewhere

This is a representation of how your post may appear on social media. The actual post will vary between social networks

Ensemble machine learning leverages multiple models to enhance overall performance by combining weaker individual learners into stronger predictive systems. In recent years, researchers have focused on improving model accuracy for classification and prediction tasks, emphasizing the importance of robust model selection. This study evaluates the performance of various individual machine learning models—including Decision Tree, Logistic Regression, Support Vector Machine (SVM), Random Forest, Artificial Neural Network (ANN), Gaussian Naïve Bayes, K-Nearest Neighbors (KNN), and Multilayer Perceptron (MLP)—by comparing their accuracy, precision, and F1 scores. Majority voting was used as an ensemble aggregation technique to identify the most reliable model for deployment. Additionally, AutoML was implemented to automate model selection and tuning, supporting both binary classification and regression tasks without the need for manual feature engineering. The research compares 18 models: eight default models and ten generated via AutoML. Performance metrics, including accuracy, mean squared error (MSE), and R² score, were assessed using a publicly available heart disease dataset. Among the individual models, SVM, Logistic Regression, and Neural Network achieved up to 80% accuracy, while Gaussian, KNN, and MLP performed slightly lower at 76%. AutoML produced superior results, with the Generalized Linear Model (88%), Gradient Boosting (87%), Distributed Random Forest (87%), and Extra Trees (82%) demonstrating the highest predictive accuracy. This evaluation highlights AutoML’s efficiency in selecting and optimizing models for heart disease classification.


Methods (Rewritten Paragraph):

To determine the most effective model for heart disease classification, this study implemented a range of individual and ensemble machine learning techniques, optimizing them before final selection. The AutoML framework was employed to automate the process of identifying the best-performing model tailored to specific requirements. All models were tested on a common dataset to validate their predictive power, ensuring consistent comparisons. A total of 18 models—eight traditional algorithms and ten AutoML-generated models—were evaluated using ensemble methods such as boosting, bagging, and majority voting. The study utilized a popular open-source heart disease dataset (available at: [https://raw.githubusercontent.com/kb22/Heart-Disease-Prediction/master/dataset.csv]), which contains 303 patient records. The target variable classifies patients based on the presence or absence of heart disease. Since these labels could not be determined by clinical observation alone, machine learning models were trained using a wide range of independent features. These included demographic and clinical attributes such as sex (female = 0, male = 1), age (29–79 years), types of chest pain, resting blood pressure, cholesterol levels, fasting blood sugar, family history of coronary artery disease, and ECG results. Additional attributes included maximum heart rate, exercise-induced angina, ST depression during exercise, the number of vessels observed via fluoroscopy, test duration, and ischemic heart disease indicators. A flowchart (Fig. 1) illustrates the data preparation steps: retrieving the dataset from the internet, preprocessing (including handling missing values via mean imputation), renaming columns, and normalizing only independent variables to prepare the dataset for model training.

Follow the Topic

Biomedical Engineering and Bioengineering
Technology and Engineering > Biological and Physical Engineering > Biomedical Engineering and Bioengineering