5. Preprocessing utilities

Preprocessing is an important step in most machine learning pipelines. Contrary to standard machine learning, where each feature is independently preprocessed, each time series is independently preprocessed. We modify most preprocessing tools from scikit-learn accordingly for time series. These tools can be found in the pyts.prepocessing module.

5.1. Imputing missing values

InterpolationImputer imputes missing values for each time series independently using the scipy.interpolate.interp1d function. The strategy parameter controls the strategy used to perform the interpolation. The following example illustrates the impact of this parameter on the imputed values.

../_images/sphx_glr_plot_imputer_0011.png
>>> import numpy as np
>>> from pyts.preprocessing import InterpolationImputer
>>> X = [[1, None, 3, 4], [8, None, 4, None]]
>>> imputer = InterpolationImputer()
>>> imputer.transform(X)
array([[1., 2., 3., 4.],
       [8., 6., 4., 2.]])
>>> imputer.set_params(strategy='previous')
>>> imputer.transform(X)
array([[1., 1., 3., 4.],
       [8., 8., 4., 4.]])

5.2. Scaling

Scaling consists in transforming linearly a time series. Several scaling tools are made available:

  • StandardScaler: each time series has zero mean and unit variance,
  • MinMaxScaler: each time series is scaled in a given range,
  • MaxAbsScaler: each time series is scaled by its maximum absolute value,
  • RobustScaler: each time series is scaled using statistics that are robust to outliers.
../_images/sphx_glr_plot_scalers_0011.png

5.3. Non-linear transformation

Some algorithms make assumptions on the distribution of the data. Therefore it can be useful to transform time series so that they approximatively follow a given distribution. Two tools are made available:

../_images/sphx_glr_plot_transformers_0011.png