Predict the quality of water using Intel oneAPI AI Analtyics toolkit - sandy inspires
Santhosh Kumar Dhanasekaran
Bengaluru, Karnataka
- 0 Collaborators
Deploying a binary classification model on Azure Function to do on-demand predictions at scale powered by Intel oneAPi AI Analytics Toolkit. We have multiple versions of the model with varying f1 scores - the latest model has 94 as it's f1 score. Integrated GitHub actions with GitHub repo to make se ...learn more
Project status: Published/In Market
Intel Technologies
oneAPI
Overview / Usage
Freshwater is one of our most vital and scarce natural resources, making up just 3% of the earth’s total water volume. It touches nearly every aspect of our daily lives, from drinking, swimming, and bathing to generating food, electricity, and the products we use every day. Access to a safe and sanitary water supply is essential not only to human life, but also to the survival of surrounding ecosystems that are experiencing the effects of droughts, pollution, and rising temperatures.
As always I build a binary classification model to solve this and the most important thing is to get it out. So what do I mean by that, let's say we build a good classifier but if it's not published or deployed as an endpoint for user consumption then it's of no use or only it's used for research and study.
This project involves not just building a model but also deploying it to Azure Function with MlOps through GitHub Actions to keep the model updated. This exposes an endpoint where the end-user via Postman or programmatically sends a post request and gets the predictions. I didn't just stop with sending a binary class. I'm sending them the probability, feature importance, and response time. I'm also recording the feature values that the user has passed for further analysis and improvement of the model itself.
Methodology / Approach
Just by looking at the dataset, you can conclude a lot of things once were the imbalance in the dataset.
I also wanted to use a decision tree-based classifier and I went with Random Forest for this.
I've tried SVG, Logistic Regression, XGBoost, LightBGM, and Decision Tree classification models.
Technologies Used
Python, Pandas, and libraries from Intel® AI Analytics Toolkit (AI Kit)