Algorithms: Text processing, Natural Language processing, deep learning, LSTM networks, RNN networks, SOM, K-fold cross-validation.
Languages: Java, R, Python
In project management a key role plays a planning phase. In the agile metho-dologies, during planning, defined are tasks and scope of the sprint. Declaringtoo much functionality may result in failure to meet the deadline and con-sequently to loss of customer confidence. On the other hand, realize too fewjobs will generate costs for the project. The ideal option is to find the "goldenmean."
The aim of our project is the implementation of application estimating thetime needed for the User Story. The project will be used methods based onartificial intelligence. Decision algorithm will be based on the techniques offinding similarities in the text, and a set of input data will be retrieved from the JIRA system
The title and description of the User Story will be combined into one string.For each User Story this string will have a different number of words. Thenevery word of natural language will be mapped to a numerical vector. Sequ-ences of words will be given to the input of a recursive neural network Longshort-term memory (LSTM), which is of particular use in the analysis of natu-ral language. The big advantage of using recursive neural network is its abilityto accumulate (values in the respective steps are stored). Therefore, the or-der of words in the description User Story is important in the construction offeature vector. All received vectors output should be connected (respectivelysum up), so that the length obtained after summing the vector was indepen-dent of the number of words in the User Story.The third step is to use a decision algorithm "with teacher- the neuralnetwork. Before the vector of features will be given for input of algorithm, itmust be normalised and processed by PCA (Principal Component Analysis).PCA analysis will set the most independent components, playing the greaterrole in the process of teaching.On the stage of teaching the neural network, the data set will be createdby the vector (input) and numeric value (output). The data set will be divi-ded into two groups. Some of the data will be used to learning the classifier,and some for testing. To increase at this stage the number of observationsand divide data between a set-training and testing, it is performed K-foldcross-validation. .