pairwise_distances#
- cuml.metrics.pairwise_distances(X, Y=None, metric='euclidean', convert_dtype=True, metric_arg=2, **kwds)[source]#
Compute the distance matrix from a vector array
Xand optionalY.This method takes either one or two vector arrays, and returns a distance matrix.
If
Yis given (default isNone), then the returned matrix is the pairwise distance between the arrays from bothXandY.Valid values for metric are:
- From scikit-learn: [‘cityblock’, ‘cosine’, ‘euclidean’, ‘l1’, ‘l2’, ‘manhattan’].
Sparse matrices are supported, see ‘sparse_pairwise_distances’.
- From scipy.spatial.distance: [‘sqeuclidean’]
See the documentation for scipy.spatial.distance for details on this metric. Sparse matrices are supported.
- Parameters:
- XDense or sparse matrix (device or host) of shape
(n_samples_x, n_features) Acceptable formats: cuDF DataFrame, NumPy ndarray, Numba device ndarray, cuda array interface compliant array like CuPy, or cupyx.scipy.sparse for sparse input
- Yarray-like (device or host) of shape (n_samples_y, n_features), optional
Acceptable formats: cuDF DataFrame, NumPy ndarray, Numba device ndarray, cuda array interface compliant array like CuPy
- metric{“cityblock”, “cosine”, “euclidean”, “l1”, “l2”, “manhattan”, “sqeuclidean”}
The metric to use when calculating distance between instances in a feature array.
- convert_dtypebool, optional (default = True)
When set to True, the method will, when necessary, convert Y to be the same data type as X if they differ. This will increase memory used for the method.
- Returns:
- Darray [n_samples_x, n_samples_x] or [n_samples_x, n_samples_y]
A distance matrix D such that D_{i, j} is the distance between the ith and jth vectors of the given matrix
X, ifYis None. IfYis notNone, then D_{i, j} is the distance between the ith array fromXand the jth array fromY.
Examples
>>> import cupy as cp >>> from cuml.metrics import pairwise_distances
>>> X = cp.array([[2.0, 3.0], [3.0, 5.0], [5.0, 8.0]]) >>> Y = cp.array([[1.0, 0.0], [2.0, 1.0]])
>>> # Euclidean Pairwise Distance, Single Input: >>> pairwise_distances(X, metric='euclidean') array([[0. , 2.236..., 5.830...], [2.236..., 0. , 3.605...], [5.830..., 3.605..., 0. ]])
>>> # Cosine Pairwise Distance, Multi-Input: >>> pairwise_distances(X, Y, metric='cosine') array([[0.445... , 0.131...], [0.485..., 0.156...], [0.470..., 0.146...]])
>>> # Manhattan Pairwise Distance, Multi-Input: >>> pairwise_distances(X, Y, metric='manhattan') array([[ 4., 2.], [ 7., 5.], [12., 10.]])