2. Classification of raw time series

Algorithms that can directly classify time series have been developed. The following sections will describe the ones that are available in pyts. They can be found in the pyts.classification module.

2.1. KNeighborsClassifier

The k-nearest neighbors algorithm is a relatively simple algorithm. KNeighborsClassifier finds the k nearest neighbors of a time series and the predicted class is determined with majority voting. A key parameter of this algorithm is the metric used to find the nearest neighbors. A popular metric for time series is the Dynamic Time Warping metric (see Metrics for time series). The one-nearest-neighbor algorithm with this metric can be considered as a good baseline for time series classification:

>>> from pyts.classification import KNeighborsClassifier
>>> from pyts.datasets import load_gunpoint
>>> X_train, X_test, y_train, y_test = load_gunpoint(return_X_y=True)
>>> clf = KNeighborsClassifier(metric='dtw')
>>> clf.fit(X_train, y_train) 
KNeighborsClassifier(...)
>>> clf.score(X_test, y_test)
0.91...

2.2. SAX-VSM

SAX-VSM stands for Symbolic Aggregate approXimation in Vector Space Model. SAXVSM is an algorithm based on the SAX representation of time series in a vector space model. It first transforms a time series of floats into a sequence of letters using the Symbolic Aggregate approXimation algorithm. Then each sequence of letters is transformed into a bag of words using a sliding window. Finally, a term-frequency inverse-term-frequency (tf-idf) vector is computed for each class. Predictions are made using the cosine similarity between the time series and the tf-idf vectors for each class. The predicted class is the class yielding the highest cosine similarity.

../_images/sphx_glr_plot_saxvsm_0011.png
>>> from pyts.classification import SAXVSM
>>> from pyts.datasets import load_gunpoint
>>> X_train, X_test, y_train, y_test = load_gunpoint(return_X_y=True)
>>> clf = SAXVSM(window_size=34, sublinear_tf=False, use_idf=False)
>>> clf.fit(X_train, y_train) 
SAXVSM(...)
>>> clf.score(X_test, y_test)
0.76

References

  • P. Senin, and S. Malinchik, “SAX-VSM: Interpretable Time Series Classification Using SAX and Vector Space Model”. International Conference on Data Mining, 13, 1175-1180 (2013).

2.3. BOSSVS

BOSSVS stands for Bag of Symbolic Fourier Symbols in Vector Space. BOSSVS is another bag-of-words approach for time series classification. BOSSVS is relatively similar to SAX-VSM: it builds a term-frequency inverse-term-frequency vector for each class, but the symbols used to create the words are generated with the Symbolic Fourier Approximation algorithm.

../_images/sphx_glr_plot_bossvs_0011.png
>>> from pyts.classification import BOSSVS
>>> from pyts.datasets import load_gunpoint
>>> X_train, X_test, y_train, y_test = load_gunpoint(return_X_y=True)
>>> clf = BOSSVS(window_size=28)
>>> clf.fit(X_train, y_train) 
BOSSVS(...)
>>> clf.score(X_test, y_test)
0.98

References

  • P. Schäfer, “Scalable Time Series Classification”. Data Mining and Knowledge Discovery, 30(5), 1273-1298 (2016).