pyts.classification.KNeighborsClassifier

class pyts.classification.KNeighborsClassifier(n_neighbors=1, weights='uniform', algorithm='auto', leaf_size=30, p=2, metric='minkowski', metric_params=None, n_jobs=1, **kwargs)[source]

k-nearest neighbors classifier.

Parameters:
n_neighbors : int, optional (default = 1)

Number of neighbors to use.

weights : str or callable, optional (default = ‘uniform’)

weight function used in prediction. Possible values:

  • ‘uniform’ : uniform weights. All points in each neighborhood are weighted equally.
  • ‘distance’ : weight points by the inverse of their distance. in this case, closer neighbors of a query point will have a greater influence than neighbors which are further away.
  • [callable] : a user-defined function which accepts an array of distances, and returns an array of the same shape containing the weights.
algorithm : {‘auto’, ‘ball_tree’, ‘kd_tree’, ‘brute’}, optional

Algorithm used to compute the nearest neighbors. Ignored ff metric is either ‘dtw’, ‘dtw_sakoechiba’, ‘dtw_itakura’, ‘dtw_multiscale’, ‘dtw_fast’ or ‘boss’ (‘brute’ will be used).

Note: fitting on sparse input will override the setting of this parameter, using brute force.

leaf_size : int, optional (default = 30)

Leaf size passed to BallTree or KDTree. This can affect the speed of the construction and query, as well as the memory required to store the tree. The optimal value depends on the nature of the problem.

metric : string or DistanceMetric object (default = ‘minkowski’)

The distance metric to use for the tree. The default metric is minkowski, and with p=2 is equivalent to the standard Euclidean metric. See the documentation of the DistanceMetric class from scikit-learn for a list of available metrics. For Dynamic Time Warping, the available metrics are ‘dtw’, ‘dtw_sakoechiba’, ‘dtw_itakura’, ‘dtw_multiscale’, ‘dtw_fast’ and ‘boss’.

p : integer, optional (default = 2)

Power parameter for the Minkowski metric. When p = 1, this is equivalent to using manhattan_distance (l1), and euclidean_distance (l2) for p = 2. For arbitrary p, minkowski_distance (l_p) is used.

metric_params : dict, optional (default = None)

Additional keyword arguments for the metric function.

n_jobs : int, optional (default = 1)

The number of parallel jobs to run for neighbors search. If n_jobs=-1, then the number of jobs is set to the number of CPU cores. Doesn’t affect fit() method.

Examples

>>> from pyts.classification import KNeighborsClassifier
>>> from pyts.datasets import load_gunpoint
>>> X_train, X_test, y_train, y_test = load_gunpoint(return_X_y=True)
>>> clf = KNeighborsClassifier()
>>> clf.fit(X_train, y_train) # doctest: +ELLIPSIS
KNeighborsClassifier(...)
>>> clf.score(X_test, y_test)
0.91...
Attributes:
classes_ : array, shape = (n_classes,)

An array of class labels known to the classifier.

Methods

__init__(self[, n_neighbors, weights, …]) Initialize self.
fit(self, X, y) Fit the model according to the given training data.
get_params(self[, deep]) Get parameters for this estimator.
predict(self, X) Predict the class labels for the provided data.
predict_proba(self, X) Return probability estimates for the test data X.
score(self, X, y[, sample_weight]) Return the mean accuracy on the given test data and labels.
set_params(self, \*\*params) Set the parameters of this estimator.
__init__(self, n_neighbors=1, weights='uniform', algorithm='auto', leaf_size=30, p=2, metric='minkowski', metric_params=None, n_jobs=1, **kwargs)[source]

Initialize self. See help(type(self)) for accurate signature.

fit(self, X, y)[source]

Fit the model according to the given training data.

Parameters:
X : array-like, shape = (n_samples, n_timestamps)

Training vector.

y : array-like, shape = (n_samples,)

Class labels for each data sample.

Returns:
self : object
get_params(self, deep=True)

Get parameters for this estimator.

Parameters:
deep : bool, default=True

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:
params : mapping of string to any

Parameter names mapped to their values.

predict(self, X)[source]

Predict the class labels for the provided data.

Parameters:
X : array-like, shape = (n_samples, n_timestamps)

Test samples.

Returns:
y_pred : array-like, shape = (n_samples,)

Class labels for each data sample.

predict_proba(self, X)[source]

Return probability estimates for the test data X.

Parameters:
X : array-like, shape = (n_samples, n_features)

Test samples.

Returns:
p : array, shape = (n_samples, n_classes)

Probability estimates.

score(self, X, y, sample_weight=None)

Return the mean accuracy on the given test data and labels.

In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted.

Parameters:
X : array-like of shape (n_samples, n_features)

Test samples.

y : array-like of shape (n_samples,) or (n_samples, n_outputs)

True labels for X.

sample_weight : array-like of shape (n_samples,), default=None

Sample weights.

Returns:
score : float

Mean accuracy of self.predict(X) wrt. y.

set_params(self, **params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:
**params : dict

Estimator parameters.

Returns:
self : object

Estimator instance.