Multivariate Time-series Forecasting in Software Clones
Devesh Manjhi
Varanasi, Uttar Pradesh
- 0 Collaborators
Predicting the number of exact-match and near-miss clone sets in the upcoming versions of an open-source software application ...learn more
Project status: Under Development
Intel Technologies
Other
Overview / Usage
During software evolution, there is a tendency of code fragments being copied or modified slightly in the same as well as subsequent versions, giving rise to exact-match and near-miss clones. Cloned code fragments have a bad effect on the software quality and maintenance. If we can detect these recurring code fragments and model them, it can be immensely helpful in the software maintenance activities. In this project, we mainly explored machine learning strategies for temporal analysis of software clone evolution using software metrics. The detection of clones in a large software system is challenging as it depends on the internal design of software modules and methods. Object-oriented metrics like DIT, NOC, WMC, LCOM, and Cyclomatic complexity can be used as good indicators of clone contents.
Methodology / Approach
In the first phase of the project, we extracted the exact-match and near-miss clones from an open-source software application and also the object-oriented metrics in each version of the software. modeled the number of clone sets in the different versions of the software as a time-series. We used machine learning methods for the multivariate time-series modeling to forecast the number of EMCS/NMCS in the upcoming versions of the software.
In the second phase, we used advanced machine learning algorithms for the time-series modeling of the clone datasets. We used the multi-objective genetic algorithm to train a feedforward neural network giving prediction intervals, optimizing the mean prediction interval width and prediction interval coverage probability. This model yielded better accuracy than the conventional ARIMA and backpropagation methods.
Technologies Used
CloneDr, Eclipse (Metrics plugin), R (Time series Forecasting), Matlab, Weka