.. note::
    :class: sphx-glr-download-link-note

    Click :ref:`here <sphx_glr_download_auto_examples_classification_plot_saxvsm.py>` to download the full example code
.. rst-class:: sphx-glr-example-title

.. _sphx_glr_auto_examples_classification_plot_saxvsm.py:


================================================================
Symbolic Aggregate approXimation in Vector Space Model (SAX-VSM)
================================================================

This example shows how the SAX-VSM algorithm transforms a dataset
consisting of time series and their corresponding labels into a
document-term matrix using tf-idf statistics. Each class is represented
as a tfidf vector. For an unlabeled time series, the predicted label is
the label of the tfidf vector giving the highest cosine similarity with
the tf vector of the unlabeled time series.
It is implemented as :class:`pyts.classification.SAXVSM`.


.. image:: /auto_examples/classification/images/sphx_glr_plot_saxvsm_001.png
    :class: sphx-glr-single-img


.. code-block:: default


    # Author: Johann Faouzi <johann.faouzi@gmail.com>
    # License: BSD-3-Clause

    import numpy as np
    import matplotlib.pyplot as plt
    from pyts.classification import SAXVSM
    from pyts.datasets import load_gunpoint

    # Toy dataset
    X_train, X_test, y_train, y_test = load_gunpoint(return_X_y=True)

    # SAXVSM transformation
    saxvsm = SAXVSM(n_bins=4, strategy='uniform', window_size=2,
                    sublinear_tf=True)
    saxvsm.fit(X_train, y_train)
    tfidf = saxvsm.tfidf_
    vocabulary_length = len(saxvsm.vocabulary_)
    X_new = saxvsm.decision_function(X_test)

    # Visualize the transformation
    plt.figure(figsize=(14, 5))
    width = 0.4

    plt.subplot(121)
    plt.bar(np.arange(vocabulary_length) - width / 2, tfidf[0],
            width=width, label='Class 1')
    plt.bar(np.arange(vocabulary_length) + width / 2, tfidf[1],
            width=width, label='Class 2')
    plt.xticks(np.arange(vocabulary_length),
               np.vectorize(saxvsm.vocabulary_.get)(np.arange(vocabulary_length)),
               fontsize=14)
    plt.ylim((0, 5.5))
    plt.xlabel("Words", fontsize=14)
    plt.ylabel("tf-idf", fontsize=14)
    plt.title("tf-idf vector for each class (training set)", fontsize=15)
    plt.legend(loc='best')

    plt.subplot(122)
    n_samples_plot = 8
    plt.bar(np.arange(n_samples_plot) - width / 2, X_new[:n_samples_plot, 0],
            width=width, label='Class 1')
    plt.bar(np.arange(n_samples_plot) + width / 2, X_new[:n_samples_plot, 1],
            width=width, label='Class 2')
    plt.xticks(np.arange(n_samples_plot), y_test[:n_samples_plot], fontsize=14)
    plt.ylim((0, 1.2))
    plt.xlabel("True label", fontsize=14)
    plt.ylabel("Cosine similarity", fontsize=14)
    plt.title(("Cosine similarity between tf-idf vectors for each class\n"
               "and tf vectors for each sample (test set)"), fontsize=15)
    plt.legend(loc='best')

    plt.suptitle("SAX-VSM", y=0.95, fontsize=22)
    plt.tight_layout()
    plt.subplots_adjust(top=0.75)
    plt.show()


.. rst-class:: sphx-glr-timing

   **Total running time of the script:** ( 0 minutes  1.031 seconds)


.. _sphx_glr_download_auto_examples_classification_plot_saxvsm.py:


.. only :: html

 .. container:: sphx-glr-footer
    :class: sphx-glr-footer-example


  .. container:: sphx-glr-download

     :download:`Download Python source code: plot_saxvsm.py <plot_saxvsm.py>`


  .. container:: sphx-glr-download

     :download:`Download Jupyter notebook: plot_saxvsm.ipynb <plot_saxvsm.ipynb>`


.. only:: html

 .. rst-class:: sphx-glr-signature

    `Gallery generated by Sphinx-Gallery <https://sphinx-gallery.github.io>`_