Efficient Prediction of Water Quality Index (WQI) Using Machine Learning Algorithms

Md. Mehedi Hassan Jan 28, 2025

Every impactful research project has a story, and for our team, this story began with the question: how can technology improve water quality monitoring to ensure better health and environmental outcomes? Our paper, "Efficient Prediction of Water Quality Index (WQI) Using Machine Learning Algorithms," addressed this question and earned the 2022 Best Paper Award from Human-Centric Intelligent Systems.

The Process and Methodology

The foundation of our research was built on a comprehensive analysis of water quality data sourced from India's diverse water bodies. The dataset included essential parameters such as dissolved oxygen (DO), biological oxygen demand (BOD), pH, and total coliform (TC). To ensure a reliable and replicable process, we designed a robust workflow for data preparation and modeling, as depicted in Figure 1.

Figure 1: Research Workflow
This figure illustrates the sequence of steps followed in the study:

Data Collection: Acquiring datasets from Kaggle, focusing on key water quality parameters.
Data Preprocessing: Addressing missing data using Random Forest imputation and applying Min-Max normalization for scaling.
Feature Selection: Identifying critical variables using a correlation matrix.
Machine Learning Models: Training and testing five algorithms (Neural Network, Random Forest, Multinomial Logistic Regression, Support Vector Machine, and Bagged Tree Model).
Performance Evaluation: Comparing model accuracies and identifying the best performer.

This structured approach not only streamlined our study but also ensured replicability, a cornerstone of rigorous research.

Key Findings and Insights

The performance of the machine learning algorithms was assessed using metrics such as accuracy and kappa values.

The Multinomial Logistic Regression (MLR) model achieved the highest accuracy of 99.83%, setting a benchmark for water quality prediction systems.
Random Forest (RF) followed closely with an accuracy of 98.99%, demonstrating its strength in handling complex datasets.
Other models, including Neural Network (98.65%), Bagged Tree Model (98.99%), and Support Vector Machine (96.98%), also performed well, though slightly lower than MLR.

The chart underscores the reliability of MLR in WQI prediction, making it an ideal choice for real-world applications.

Practical Implications

Our study's results provide a roadmap for developing efficient, data-driven systems for water quality monitoring. The insights gained can support policymakers, environmental agencies, and researchers in implementing proactive measures to ensure safe water access.

Looking forward, we aim to build a software application using our proposed model, enabling real-time water quality predictions. Such a tool could revolutionize water resource management, particularly in regions facing acute water quality challenges.

Final Thoughts

Winning the Best Paper Award has been a tremendous honor, motivating us to continue exploring the potential of machine learning in solving critical environmental problems. We extend our heartfelt thanks to the editorial board of Human-Centric Intelligent Systems for this recognition and to our research team at VRD Research Lab for their dedication and collaboration.