pairwise_distances#

cuml.metrics.pairwise_distances(X, Y=None, metric='euclidean', convert_dtype=True, **kwds)[source]#

Compute the distance matrix from a feature array X and optional Y.

This function takes either one or two feature arrays, and returns a distance matrix.

Parameters:
X{array-like, sparse matrix}, shape=(n_samples_X, n_features)

A feature array.

Y{array-like, sparse matrix}, shape=(n_samples_y, n_features), default=None

A second feature array. If None, Y=X will be used.

metricstr, default=”euclidean”

The metric to use when calculating distance between instances in a feature array. Valid options are:

  • Supports both dense and sparse data: [‘canberra’, ‘chebyshev’, ‘cityblock’, ‘cosine’, ‘euclidean’, ‘hellinger’, ‘l1’, ‘l2’, ‘manhattan’, ‘minkowski’, ‘sqeuclidean’].

  • Supports dense only: [‘correlation’, ‘hamming’, ‘jensenshannon’, ‘kldivergence’, ‘nan_euclidean’, ‘russellrao’].

  • Supports sparse only: [‘dice’, ‘inner_product’, ‘jaccard’].

convert_dtypebool, optional (default = True)

When set to True, the method will, when necessary, convert Y to be the same data type as X if they differ. This will increase memory used for the method.

**kwdsoptional keyword parameters

Any additional metric-specific parameters. For example, with metric="minkowski", passing p sets the norm used.

Returns:
Darray, shape=(n_samples_X, n_samples_X) or (n_samples_X, n_samples_Y)

A distance matrix D such that D_{i, j} is the distance between the ith and jth vectors of the given matrix X, if Y is None. If Y is not None, then D_{i, j} is the distance between the ith array from X and the jth array from Y.

Examples

>>> import cupy as cp
>>> from cuml.metrics import pairwise_distances
>>> X = cp.array([[0., 0., 0.], [1., 1., 1.]])
>>> Y = cp.array([[1., 0., 0.], [1., 1., 0.]])
>>> pairwise_distances(X, metric="sqeuclidean")
array([[0., 3.],
       [3., 0.]])
>>> pairwise_distances(X, Y, metric="sqeuclidean")
array([[1., 2.],
       [2., 1.]])