The accuracy of machine learning models relies on hyperparameter tuning

student result classification using random forest, randomized search, grid search, bayesian, genetic …

Apr 04, 2025

Phd scholar ,University Teacher, Pokhara Universiry

Like Be the first to like this

Hyperparameter Tuning and Model Optimization

Hyperparameters play a pivotal role in determining the predictive performance of machine learning models. They help balance overfitting and underfitting by adjusting the influence of research-independent features, thereby preventing extreme model behaviors. Both manual tuning and automated techniques are employed to identify the optimal permutation and combination of hyperparameters to enhance model accuracy.

This study investigates hyperparameter optimization using various tuning methods, including Logistic Regression, Random Forest, Randomized Search, Grid Search, Genetic Algorithm, Bayesian Optimization, and Optuna. The primary goal was to identify the model with the highest predictive accuracy for student grade classification.

Model performance was evaluated using confusion matrices and Receiver Operating Characteristic–Area Under the Curve (ROC-AUC) curves. Among all tuning methods, the Genetic Algorithm achieved the highest classification accuracy (82.5%) and AUC-ROC score (90%). Manual tuning—using an estimator of 300, entropy as the criterion, square root for max features, and a minimum leaf sample of 10—yielded 81.1% accuracy, closely matching the performance of the Randomized Search Cross-Validation algorithm. The default Random Forest model recorded the lowest accuracy at 78%.

Although Grid Search achieved high accuracy, it required significantly longer execution time (941.5 seconds) compared to manual tuning (3.66 seconds). These findings highlight the importance of selecting efficient hyperparameter tuning techniques for optimizing machine learning models in student grade prediction tasks.

Methods and Data Preparation

The central aim of this research is to validate and compare machine learning model accuracy by separating target (dependent) variables from independent features. After receiving ethical approval from the Faculty of Science and Technology at IIS University, sample data were collected from Pokhara University, Nepal.

Sample size estimation was performed using Cochran’s formula (Cochran, 1977), which is suitable for large populations:

$n_0 = \frac{z^2pq}{e^2}$

Where:

$n_0$ is the required sample size,
$p$ is the estimated population proportion (0.5),
$q = 1 - p$ ,
$z$ is the z-score corresponding to the 95% confidence level (1.96),
$e$ is the desired level of precision (0.05).

Applying the formula, the calculated sample size was 376 for passed students. An equal number of failed student records were added, resulting in a total of 752 students from 14 academic programs, including health sciences, engineering, and management, using data from the fiscal year 2022. After removing missing and incomplete records, the final dataset consisted of 711 student records.

As a preliminary step, logistic regression analysis was performed to examine relationships between dependent and independent variables. The regression model provided confidence intervals, standard errors, t-statistics, and p-values for each feature, allowing interpretation of statistical significance.

The coefficient of determination (R²) was 0.3, indicating that 30% of the variability in student outcomes could be explained by the independent variables. The F-statistic was significant (p = 1.51e-47), suggesting that the overall model was statistically reliable. Coefficients measured the change in the dependent variable resulting from a one-unit change in each independent variable, assuming other variables remain constant. The t-statistics tested the null hypothesis that the coefficients are zero, and the p-values indicated the probability of observing such t-statistics under the null hypothesis. Smaller p-values pointed to stronger evidence against the null hypothesis.

The omnibus test further assessed the skewness and kurtosis of the model residuals to ensure the validity of the regression assumptions.

Yagyanath Rimal

Phd scholar ,University Teacher, Pokhara Universiry

I'm Yagyanath Rimal, a university teacher specializing in computer science. In addition to teaching C, C++, Java, and web applications at the bachelor's level, I also teach Research Methodology, Data Analytics, and Machine Learning to master's degree students. Currently, I am focusing on research in data analytics and machine learning. I've recently embarked on my journey in academia and am also interested in international collaborations and postdoctoral opportunities.

Follow the Topic

Computer and Information Systems Applications

Mathematics and Computing > Computer Science > Computer and Information Systems Applications

Machine learning model matters its accuracy

Hyperparameter optimization

A comparative analysis of ensemble autoML machine learning prediction accuracy of STEM student grade prediction

Hyperparameter determines the best learning curve on single

Machine Learning Random Forest Cluster Analysis

Cookies

We use cookies to ensure the functionality of our website, to personalize content and advertising, to provide social media features, and to analyze our traffic. If you allow us to do so, we also inform our social media, advertising and analysis partners about your use of our website. You can decide for yourself which categories you want to deny or allow. Please note that based on your settings not all functionalities of the site are available.

Further information can be found in our privacy policy.

The accuracy of machine learning models relies on hyperparameter tuning

Share this post

Share with...

...or copy the link