
- Projects 1
- Followers 1
This project aims to improve the classification ability of time series shapelets by learning a unique distance metric for each shapelet that best separates samples. ...learn more
Project status: Under Development
Intel Technologies
AI DevCloud / Xeon
Shapelets [1] are small subsequences of time series that can be used to classify unlabeled time series. Shapelets can be used in stand-alone classifiers and are an essential part of the most accurate ensemble methods for time series classification. Previous shapelet research has focused on finding the most efficient approaches to discover shapelets and how to incorporate shapelets into different classification methods. However, a shapelet's ability to accurately classify depends on a single measurement: its distance from each sample. Euclidean distance, while robust, is the only distance metric that has been used with shapelets.
This project explores an alternative to Euclidean distance by creating a distance metric specific to each shapelet. If successful, this improvement could improve shapelet accuracy for time series classification across dozens of domains including finance, entertainment, medicine, biometrics, human motion, chemistry, astronomy, robotics, entomology, wildlife management, food spectroscopy, and PMU disturbance event classification.
First, given a shapelet, we use a unique mapping that uses the shapelet to project time series training samples into Euclidean point space. In this space, the current approach of using a fixed Euclidean distance threshold for shapelet classification is equivalent to creating a spherical decision boundary. We then use logistic regression in this Euclidean space to learn a decision boundary that can more accurately capture the structure of the data. The goal is to improve the classification power of the shapelet by providing better separation between samples of different classes.
The number of dimensions in this Euclidean space often exceed the number of training points, meaning that dimensionality reduction is required. We are currently exploring using Principal Component Analysis and Generalized Eigenvectors [2] to reduce dimensions prior to learning the new distance with logistic regression.
We are currently exploring the impact of using shapelet-specific distances instead of Euclidean distance on 4 shapelet classification methods: Fast Shapelets, Shapelet Transform, Learned Shapelets, and FLAG. A brief description of each shapelet method is below.
As a testbed, we're using the UCR/UEA repository [3,4] which contains 128 datasets from dozens of domains.
Shapelet methods
[1] Ye, Lexiang, and Eamonn Keogh. "Time series shapelets: a new primitive for data mining." Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 2009.
[2] Karampatziakis, Nikos, and Paul Mineiro. "Discriminative features via generalized eigenvectors." Proceedings of the 31st International Conference on Machine Learning, 2014.
[3] Hoang Anh Dau, Eamonn Keogh, Kaveh Kamgar, Chin-Chia Michael Yeh, Yan Zhu, Shaghayegh Gharghabi , Chotirat Ann Ratanamahatana, Yanping Chen, Bing Hu, Nurjahan Begum, Anthony Bagnall , Abdullah Mueen and Gustavo Batista (2018). The UCR Time Series Classification Archive. https://www.cs.ucr.edu/~eamonn/time_series_data_2018
[4] Anthony Bagnall, Jason Lines, William Vickers and Eamonn Keogh, The UEA & UCR Time Series Classification Repository, www.timeseriesclassification.com
[5] Rakthanmanon, Thanawin, and Eamonn Keogh. "Fast shapelets: A scalable algorithm for discovering time series shapelets." proceedings of the 2013 SIAM International Conference on Data Mining. Society for Industrial and Applied Mathematics, 2013.
[6] Hills, Jon, et al. "Classification of time series by shapelet transformation." Data Mining and Knowledge Discovery 28.4, 2014.
[7] Grabocka, Josif, et al. "Learning time-series shapelets." Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 2014.
[8] Hou, Lu, James T. Kwok, and Jacek M. Zurada. "Efficient learning of timeseries shapelets." Thirtieth AAAI Conference on Artificial Intelligence, 2016.
Computing Resources:
Intel Colfax
Programming Languages:
MATLAB
C++
Java
Dimensionality Reduction Techniques:
Principal Component Analysis
Generalized Eigenvectors