Anomaly Detection

Chirav Dave

Chirav Dave

Kirkland, Washington

1 0
  • 0 Collaborators

Architected a hybrid model using five different machine learning models (Logistic Regression, SVM, Optimized KNN, Random Forest and Gaussian Based Model) with weighted polling to detect component failures in the Air Pressure System of heavy Scania trucks. Applied various feature engineering methods to deal with inconsistent data. ...learn more

Project status: Concept

Artificial Intelligence

Groups
Student Developers for AI

Intel Technologies
Intel CPU

Code Samples [1]

Overview / Usage

In this paper, I am trying to model a prediction system for detecting a component failure in the Air Pressure System (APS) in the heavy Scania trucks. The prediction will tell if there is an imminent failure in the heavy trucks. The data were thus collected from the APS system that is used in these day-to-day trucks. The APS, in general, is a system which generates pressurized air to use in different component functions in the trucks such as braking, gears, suspension, etc. A positive class is given to component failure that belongs to the APS system and a negative class is given to component failure related to anything else. Other than just predicting the failure of the component, I'm also trying to optimize the cost of a failure. A cost of 10 is given to a correct prediction i.e. predicting a failure of APS component and a cost of 500 is given to a false negative i.e. to failure that was not predicted by our model. Thus, penalty minimizing is also one of the main goals. The problem can be said to be a classification problem. I have used 5 different models, Logistic regression, Support Vector Machine, Random forest, Gaussian model, Random model and K-Nearest Neighbour. All of these models use a different type of data cleaning techniques and feature selection. In the end, according to the cost predicted by each model, a weight is assigned to each one of them and a final decision is taken based on this weighted polling. The experiments shows that the best classifier is the Random Forest with the cost of around 10,000. Also, the accuracy of all the other models came out to be approximately 95%.

Methodology / Approach

This problem was considered as a classification problem and many well-known classification algorithms such as Logistic Regression, k-nearest neighbours, Support Vector Machine (SVM), Decision Trees and Random Forests were applied to it. The experimental results showed that the best classifier was cost-wise 92.56 % better than a straightforward solution where a random classification was performed.

Technologies Used

Technology Stack: Python, Pandas, Scikit-learn, Matplotlib

Repository

https://github.com/chiravdave/Projects-Papers/blob/master/Anomaly_Detection.pdf

Comments (0)