pairwise_distances#

cuml.metrics.pairwise_distances(X, Y=None, metric='euclidean', convert_dtype=True, metric_arg=2, **kwds)[source]#

Compute the distance matrix from a vector array X and optional Y.

This method takes either one or two vector arrays, and returns a distance matrix.

If Y is given (default is None), then the returned matrix is the pairwise distance between the arrays from both X and Y.

Valid values for metric are:

  • From scikit-learn: [‘cityblock’, ‘cosine’, ‘euclidean’, ‘l1’, ‘l2’, ‘manhattan’].

    Sparse matrices are supported, see ‘sparse_pairwise_distances’.

  • From scipy.spatial.distance: [‘sqeuclidean’]

    See the documentation for scipy.spatial.distance for details on this metric. Sparse matrices are supported.

Parameters:
XDense or sparse matrix (device or host) of shape

(n_samples_x, n_features) Acceptable formats: cuDF DataFrame, NumPy ndarray, Numba device ndarray, cuda array interface compliant array like CuPy, or cupyx.scipy.sparse for sparse input

Yarray-like (device or host) of shape (n_samples_y, n_features), optional

Acceptable formats: cuDF DataFrame, NumPy ndarray, Numba device ndarray, cuda array interface compliant array like CuPy

metric{“cityblock”, “cosine”, “euclidean”, “l1”, “l2”, “manhattan”, “sqeuclidean”}

The metric to use when calculating distance between instances in a feature array.

convert_dtypebool, optional (default = True)

When set to True, the method will, when necessary, convert Y to be the same data type as X if they differ. This will increase memory used for the method.

Returns:
Darray [n_samples_x, n_samples_x] or [n_samples_x, n_samples_y]

A distance matrix D such that D_{i, j} is the distance between the ith and jth vectors of the given matrix X, if Y is None. If Y is not None, then D_{i, j} is the distance between the ith array from X and the jth array from Y.

Examples

>>> import cupy as cp
>>> from cuml.metrics import pairwise_distances
>>> X = cp.array([[2.0, 3.0], [3.0, 5.0], [5.0, 8.0]])
>>> Y = cp.array([[1.0, 0.0], [2.0, 1.0]])
>>> # Euclidean Pairwise Distance, Single Input:
>>> pairwise_distances(X, metric='euclidean')
array([[0.        , 2.236..., 5.830...],
    [2.236..., 0.        , 3.605...],
    [5.830..., 3.605..., 0.        ]])
>>> # Cosine Pairwise Distance, Multi-Input:
>>> pairwise_distances(X, Y, metric='cosine')
array([[0.445... , 0.131...],
    [0.485..., 0.156...],
    [0.470..., 0.146...]])
>>> # Manhattan Pairwise Distance, Multi-Input:
>>> pairwise_distances(X, Y, metric='manhattan')
array([[ 4.,  2.],
    [ 7.,  5.],
    [12., 10.]])