3. Extracting features from time series

Standard machine learning algorithms are not always well suited for raw time series because they cannot capture the high correlation between consecutive time points: treating time points as features may not be optimal. Therefore, algorithms that extract features from time series have been developed. These algorithms transforms a dataset of time series with shape (n_samples, n_timestamps) into a dataset of features with shape (n_samples, n_extracted_features) that can be used to fit a standard classifier. They can be found in the pyts.transformation module. The following sections describe the algorithms made available.

3.1. ShapeletTransform

ShapeletTransform is a shapelet-based approach to extract features. A shapelet is defined as a contiguous subsequence of a time series. The distance between a shapelet and a time series is defined as the minimum of the distances between this shapelet and all the shapelets of identical length extracted from this time series. ShapeletTransform extracts the n_shapelets most discriminative shapelets given a criterion (mutual information or F-scores) from a dataset of time series when fit is called. The indices of the selected shapelets are made available via the indices_ attribute.

../_images/sphx_glr_plot_shapelet_transform_001.png

ShapeletTransform derives the distances between the selected shapelets and a dataset of time series when transform is called. fit_transform is an optimized version of fit followed by transform since the distances between the shapelets and the time series must be computed when fit is called:

>>> from pyts.transformation import ShapeletTransform
>>> X = [[0, 2, 3, 4, 3, 2, 1],
...      [0, 1, 3, 4, 3, 4, 5],
...      [2, 1, 0, 2, 1, 5, 4],
...      [1, 2, 2, 1, 0, 3, 5]]
>>> y = [0, 0, 1, 1]
>>> st = ShapeletTransform(n_shapelets=2, window_sizes=[3])
>>> X_new = st.fit_transform(X, y)
>>> X_new.shape()
(4, 2)

Classification can be performed with any standard classifier. In the example below, we use a Support Vector Machine with a linear kernel:

>>> import numpy as np
>>> from pyts.transformation import ShapeletTransform
>>> from pyts.datasets import load_gunpoint
>>> from sklearn.pipeline import make_pipeline
>>> from sklearn.svm import LinearSVC
>>> X_train, X_test, y_train, y_test = load_gunpoint(return_X_y=True)
>>> shapelet = ShapeletTransform(window_sizes=np.arange(10, 130, 3), random_state=42)
>>> svc = LinearSVC()
>>> clf = make_pipeline(shapelet, svc)
>>> clf.fit(X_train, y_train)
Pipeline(...)
>>> clf.score(X_test, y_test)
0.966...

References

  • J. Lines, L. M. Davis, J. Hills and A. Bagnall, “A Shapelet Transform for Time Series Classification”. Data Mining and Knowledge Discovery, 289-297 (2012).

3.2. BagOfPatterns

BagOfPatterns is built on top of the Bag of words. transformation. First it transforms each time series into a bag of words, then the frequency of each word for each time series is computed. Therefore, it transforms each time series into a histogram. The vocabulary_ attribute is a mapping from the feature indices to the corresponding words.

../_images/sphx_glr_plot_bop_001.png

Classification can be performed with any standard classifier. In the example below, we use a k-nearest neighbors classifier with the Euclidean distance:

>>> from pyts.transformation import BagOfPatterns
>>> from pyts.datasets import load_gunpoint
>>> from sklearn.neighbors import KNeighborsClassifier
>>> from sklearn.pipeline import make_pipeline
>>> X_train, X_test, y_train, y_test = load_gunpoint(return_X_y=True)
>>> clf = make_pipeline(
...    BagOfPatterns(window_size=32, word_size=4, n_bins=4,
                     strategy='normal', numerosity_reduction=False),
...    KNeighborsClassifier(n_neighbors=1)
... )
>>> clf.fit(X_train, y_train)
>>> clf.score(X_test, y_test)
0.98

References

  • J. Lin, R. Khade and Y. Li, “Rotation-invariant similarity in time series using bag-of-patterns representation”. Journal of Intelligent Information Systems, 39 (2), 287-315 (2012).

3.3. BOSS

BOSS stands for Bag Of Symbolic-Fourier-Approximation Symbols. BOSS extracts words from time series using the Symbolic Fourier Approximation algorithm and derives their frequencies for each time series.

../_images/sphx_glr_plot_boss_001.png

The vocabulary_ attribute is a mapping from the feature indices to the corresponding words:

>>> from pyts.datasets import load_gunpoint
>>> from pyts.transformation import BOSS
>>> X_train, X_test, _, _ = load_gunpoint(return_X_y=True)
>>> boss = BOSS(word_size=2, n_bins=2, sparse=False)
>>> boss.fit(X_train)
BOSS(...)
>>> sorted(boss.vocabulary_.values())
['aa', 'ab', 'ba', 'bb']
>>> boss.transform(X_test)
array(...)

Classification can be performed with any standard classifier. In the example below, we use a k-nearest neighbors classifier with the pyts.metrics.boss() metric:

>>> from pyts.datasets import load_gunpoint
>>> from pyts.transformation import BOSS
>>> from pyts.classification import KNeighborsClassifier
>>> from sklearn.pipeline import make_pipeline
>>> X_train, X_test, y_train, y_test = load_gunpoint(return_X_y=True)
>>> boss = BOSS(word_size=8, window_size=40, norm_mean=True, drop_sum=True, sparse=False)
>>> knn = KNeighborsClassifier(metric='boss')
>>> clf = make_pipeline(boss, knn)
>>> clf.fit(X_train, y_train)
Pipeline(...)
>>> clf.score(X_test, y_test)
1.0

References

  • P. Schäfer, “The BOSS is concerned with time series classification in the presence of noise”. Data Mining and Knowledge Discovery, 29(6), 1505-1530 (2015).

3.4. WEASEL

WEASEL stands for Word ExtrAction for time SEries cLassification. While BOSS extracts words with a single sliding window, WEASEL extracts words with several sliding windows of different sizes, and selects the most discriminative words according to the chi-squared test. The vocabulary_ attribute is a mapping from the feature indices to the corresponding words.

../_images/sphx_glr_plot_weasel_001.png

For new input data, the frequencies of each selected word are derived:

>>> from pyts.datasets import load_gunpoint
>>> from pyts.transformation import WEASEL
>>> X_train, X_test, y_train, _ = load_gunpoint(return_X_y=True)
>>> weasel = WEASEL(sparse=False)
>>> weasel.fit(X_train, y_train)
WEASEL(...)
>>>len(weasel.vocabulary_)
73
>>> weasel.transform(X_test).shape
(150, 73)

Classification can be performed with any standard classifier. In the example below, we use a logistic regression:

>>> import numpy as np
>>> from pyts.transformation import WEASEL
>>> from pyts.datasets import load_gunpoint
>>> from sklearn.pipeline import make_pipeline
>>> from sklearn.linear_model import LogisticRegression
>>> X_train, X_test, y_train, y_test = load_gunpoint(return_X_y=True)
>>> weasel = WEASEL(word_size=4, window_sizes=np.arange(5, 149))
>>> logistic = LogisticRegression(solver='liblinear')
>>> clf = make_pipeline(weasel, logistic)
>>> clf.fit(X_train, y_train)
Pipeline(...)
>>> clf.score(X_test, y_test)
0.96

References

  • P. Schäfer, and U. Leser, “Fast and Accurate Time Series Classification with WEASEL”. Conference on Information and Knowledge Management, 637-646 (2017).

3.5. ROCKET

ROCKET stands for RandOm Convolutional KErnel Transform. ROCKET generates a great variety of random convolutional kernels and extracts two features from the convolutions: the maximum and the proportion of positive values. The kernels are generated randomly and are not learned, which greatly speeds up the computation of this transformation.

../_images/sphx_glr_plot_rocket_001.png
>>> from pyts.datasets import load_gunpoint
>>> from pyts.transformation import ROCKET
>>> X_train, X_test, _, _ = load_gunpoint(return_X_y=True)
>>> rocket = ROCKET()
>>> rocket.fit(X_train)
ROCKET(...)
>>> rocket.transform(X_train).shape
(50, 20000)
>>> rocket.transform(X_test).shape
(150, 20000)

References

  • A. Dempster, F. Petitjean and G. I. Webb, “ROCKET: Exceptionally fast and accurate time series classification using random convolutional kernels”. Data Mining and Knowledge Discovery, 34(5), 1454-1495 (2020).