11. Dataset loading utilities

The UEA & UCR Time Series Classification Repository hosts a lot of datasets for time series classification. A few datasets are available in the pyts repository itself, and functions to download the other datasets are made available.

11.1. Simulated datasets

The make_cylinder_bell_funnel() function makes a synthetic dataset of univariate time series with three classes: cylinder, bell and funnel. This dataset was introduced by N. Saito in his PhD thesis “Local feature extraction and its application using a library of bases”.

The time series are generated from the following distributions:

c(t) = (6 + \eta) \cdot 1_{[a, b]}(t) + \epsilon(t)

b(t) = (6 + \eta) \cdot 1_{[a, b]}(t) \cdot (t - a) / (b - a) +
\epsilon(t)

f(t) = (6 + \eta) \cdot 1_{[a, b]}(t) \cdot (b - t) / (b - a) +
\epsilon(t)

where:

  • t=1,\ldots,128,
  • a is an integer-valued uniform random variable on the interval [16, 32],
  • b-a is an integer-valued uniform distribution on the interval [32, 96],
  • \eta and \epsilon(t) are standard normal variables,
  • {1}_{[a, b]} is the characteristic function on the interval [a, b].

c, b, and f stand for “cylinder”, “bell”, and “funnel” respectively.

11.2. Univariate time series: UCR repository

pyts comes with a copy of three univariate time series datasets:

The characteristics of these datasets are summarized in the following table:

Type Name Train Test Class Length
SPECTRO Coffee 100 100 2 96
MOTION GunPoint 50 150 2 150
HEMODYNAMICS PigCVP 104 208 52 2000

Three functions are made available to fetch other datasets from this repository:

11.3. Multivariate time series: UEA repository

pyts comes with a copy of one multivariate time series dataset:

Three functions are made available to fetch other datasets from this repository: