.. _preprocessing: ======================= Preprocessing utilities ======================= .. currentmodule:: pyts.preprocessing Preprocessing is an important step in most machine learning pipelines. Contrary to standard machine learning, where each feature is independently preprocessed, each time series is independently preprocessed. We modify most preprocessing tools from *scikit-learn* accordingly for time series. These tools can be found in the :mod:`pyts.prepocessing` module. Imputing missing values ----------------------- :class:`InterpolationImputer` imputes missing values for each time series independently using the `scipy.interpolate.interp1d `_ function. The ``strategy`` parameter controls the strategy used to perform the interpolation. The following example illustrates the impact of this parameter on the imputed values. .. figure:: ../auto_examples/preprocessing/images/sphx_glr_plot_imputer_001.png :target: ../auto_examples/preprocessing/plot_imputer.html :align: center :scale: 50% .. code-block:: python >>> import numpy as np >>> from pyts.preprocessing import InterpolationImputer >>> X = [[1, None, 3, 4], [8, None, 4, None]] >>> imputer = InterpolationImputer() >>> imputer.transform(X) array([[1., 2., 3., 4.], [8., 6., 4., 2.]]) >>> imputer.set_params(strategy='previous') >>> imputer.transform(X) array([[1., 1., 3., 4.], [8., 8., 4., 4.]]) Scaling ------- Scaling consists in transforming linearly a time series. Several scaling tools are made available: - :class:`StandardScaler`: each time series has zero mean and unit variance, - :class:`MinMaxScaler`: each time series is scaled in a given range, - :class:`MaxAbsScaler`: each time series is scaled by its maximum absolute value, - :class:`RobustScaler`: each time series is scaled using statistics that are robust to outliers. .. figure:: ../auto_examples/preprocessing/images/sphx_glr_plot_scalers_001.png :target: ../auto_examples/preprocessing/plot_scalers.html :align: center :scale: 50% Non-linear transformation ------------------------- Some algorithms make assumptions on the distribution of the data. Therefore it can be useful to transform time series so that they approximatively follow a given distribution. Two tools are made available: - :class:`PowerTransformer`: each time series is transformed to be more Gaussian-like, - :class:`QuantileTransformer`: each time series is transformed using quantile information. .. figure:: ../auto_examples/preprocessing/images/sphx_glr_plot_transformers_001.png :target: ../auto_examples/preprocessing/plot_transformers.html :align: center :scale: 80%