7. Approximating time series¶

Raw time series can be noisy or have a lot of time points. Approximating them can be useful so that the most important information is kept while reducing noise. The pyts.approximation module provides simple tools for approximating time series.

7.1. Piecewise Aggregate Approximation¶

PiecewiseAggregateApproximation (PAA) reduces the number of time points of a time series using a sliding window and taking the mean value in this window. It transforms a time series $X=(x_1,\ldots,x_n)$ into another time series $\tilde{X}=(\tilde{x}_1,\ldots,\tilde{x}_m)$ with $m<n$ such that, if $m$ divides $n$ , then

$\tilde{x}_i = \frac{m}{n} \sum_{j={(n/m)\cdot (i-1)+1}}^{(n/m)\cdot i} x_j$

If $m$ does not divides $n$ , two options are made available with the overlapping parameter:

If overlapping=True, the length of the sliding window is constant and some windows are overlapping.
If overlapping=False, the length of the sliding window may vary and the windows are non-overlapping.

>>> from pyts.approximation import PiecewiseAggregateApproximation
>>> X = [[0, 4, 2, 1, 7, 6, 3, 5],
...      [2, 5, 4, 5, 3, 4, 2, 3]]
>>> transformer = PiecewiseAggregateApproximation(window_size=2)
>>> transformer.transform(X)
array([[2. , 1.5, 6.5, 4. ],
       [3.5, 4.5, 3.5, 2.5]])

References

E. Keogh, K. Chakrabarti, M. Pazzani, and S. Mehrotra, “Dimensionality reduction for fast similarity search in large time series databases”. Knowledge and information Systems, 3(3), 263-286 (2001).

7.2. Symbolic Aggregate approXimation¶

SymbolicAggregateApproximation (SAX) reduces the dimension of the feature space by discretizing each time series independently. Several strategies can be used to derive the edges of the bins with the strategy parameter:

strategy='uniform': all bins in each sample have identical widths,
strategy='quantile': all bins in each sample have the same number of points,
strategy='normal': bin edges are quantiles from a standard normal distribution.

>>> from pyts.approximation import SymbolicAggregateApproximation
>>> X = [[0, 4, 2, 1, 7, 6, 3, 5],
...      [2, 5, 4, 5, 3, 4, 2, 3]]
>>> transformer = SymbolicAggregateApproximation()
>>> print(transformer.transform(X))
[['a' 'c' 'b' 'a' 'd' 'd' 'b' 'c']
 ['a' 'd' 'c' 'd' 'b' 'c' 'a' 'b']]

References

J. Lin, E. Keogh, L. Wei, and S. Lonardi, “Experiencing SAX: a novel symbolic representation of time series”. Data Mining and Knowledge Discovery, 15(2), 107-144 (2007).

7.3. Discrete Fourier Transform¶

DiscreteFourierTransform extracts Fourier coefficients from each time series. The n_coefs parameter controls the number of Fourier coefficients to keep.

>>> from pyts.approximation import DiscreteFourierTransform
>>> from pyts.datasets import load_gunpoint
>>> X, _, _, _ = load_gunpoint(return_X_y=True)
>>> transformer = DiscreteFourierTransform(n_coefs=4)
>>> X_new = transformer.fit_transform(X)
>>> X_new.shape
(50, 4)

References

P. Schäfer, and M. Högqvist, “SFA: A Symbolic Fourier Approximation and Index for Similarity Search in High Dimensional Datasets”, International Conference on Extending Database Technology, 15, 516-527 (2012).

7.4. Multiple Coefficient Binning¶

MultipleCoefficientBinning is similar to Symbolic Aggregate approXimation, but it discretizes each time point independently. Therefore, it is very close to sklearn.prepreocessing.KBinsDiscretizer but the strategy parameter can take different values:

‘uniform’: all bins in each sample have identical widths,
‘quantile’: all bins in each sample have the same number of points,
‘normal’: bin edges are quantiles from a standard normal distribution,
‘entropy’: bin edges are computed using information gain.

>>> from pyts.approximation import MultipleCoefficientBinning
>>> X = [[0, 4],
...      [2, 7],
...      [1, 6],
...      [3, 5]]
>>> transformer = MultipleCoefficientBinning(n_bins=2)
>>> print(transformer.fit_transform(X))
[['a' 'a']
 ['b' 'b']
 ['a' 'b']
 ['b' 'a']]

References

P. Schäfer, and M. Högqvist, “SFA: A Symbolic Fourier Approximation and Index for Similarity Search in High Dimensional Datasets”, International Conference on Extending Database Technology, 15, 516-527 (2012).

7.5. Symbolic Fourier Approximation¶

SymbolicFourierApproximation consists in combining Discrete Fourier Transform and Multiple Coefficient Binning:

First Fourier coefficients are derived using DiscreteFourierTransform,
Then Fourier coefficients are discretized using MultipleCoefficientBinning.

References

P. Schäfer, and M. Högqvist, “SFA: A Symbolic Fourier Approximation and Index for Similarity Search in High Dimensional Datasets”, International Conference on Extending Database Technology, 15, 516-527 (2012)

7. Approximating time series¶

7.1. Piecewise Aggregate Approximation¶

7.2. Symbolic Aggregate approXimation¶

7.3. Discrete Fourier Transform¶

7.4. Multiple Coefficient Binning¶

7.5. Symbolic Fourier Approximation¶

Navigation

Related Topics