KernelDensity#

class cuml.neighbors.KernelDensity(*, bandwidth=1.0, kernel='gaussian', metric='euclidean', metric_params=None, output_type=None, verbose=False)[source]#

Kernel Density Estimation. Computes a non-parametric density estimate from a finite data sample, smoothing the estimate according to a bandwidth parameter.

Parameters:

bandwidthfloat or {“scott”, “silverman”}, default=1.0: The bandwidth of the kernel.
kernel{‘gaussian’, ‘tophat’, ‘epanechnikov’, ‘exponential’, ‘linear’, ‘cosine’}, default=’gaussian’: The kernel to use.
metricstr, default=’euclidean’: The distance metric to use. Note that not all metrics are valid with all algorithms. Note that the normalization of the density output is correct only for the Euclidean distance metric. Default is ‘euclidean’.
metric_paramsdict, default=None: Additional parameters to be passed to the tree for use with the metric.
output_type{‘input’, ‘array’, ‘dataframe’, ‘series’, ‘df_obj’, ‘numba’, ‘cupy’, ‘numpy’, ‘cudf’, ‘pandas’}, default=None: Return results and set estimator attributes to the indicated output type. If None, the output type set at the module level (cuml.global_settings.output_type) will be used. See Output Data Type Configuration for more info.
verboseint or boolean, default=False: Sets logging level. It must be one of cuml.common.logger.level_*. See Verbosity Levels for more info.

Attributes:

n_features_in_int: Number of features seen during fit.
bandwidth_float: Value of the bandwidth used, either given directly via bandwidth or estimated with bandwidth="scott" or bandwidth="silverman".

Methods

`fit`(X[, y, sample_weight, convert_dtype])	Fit the Kernel Density model on the data.
`sample`([n_samples, random_state])	Generate random samples from the model.
`score`(X[, y])	Compute the total log-likelihood under the model.
`score_samples`(X, *[, convert_dtype])	Compute the log-likelihood of each sample under the model.

Examples

>>> from cuml.neighbors import KernelDensity
>>> import cupy as cp
>>> rng = cp.random.RandomState(42)
>>> X = rng.random_sample((100, 3))
>>> kde = KernelDensity(kernel='gaussian', bandwidth=0.5).fit(X)
>>> log_density = kde.score_samples(X[:3])

fit(X, y=None, sample_weight=None, *, convert_dtype=True) → KernelDensity[source]#

Fit the Kernel Density model on the data.

Parameters:

Xarray-like of shape (n_samples, n_features): List of n_features-dimensional data points. Each row corresponds to a single data point.
yNone: Ignored.
sample_weightarray-like of shape (n_samples,), default=None: List of sample weights attached to the data X.

Returns:

self: Returns the instance itself.

sample(n_samples=1, random_state=None) → CumlArray[source]#

Generate random samples from the model.

Currently, this is implemented only for gaussian and tophat kernels.

Parameters:

n_samplesint, default=1: Number of samples to generate.
random_stateint, RandomState instance or None, default=None: Determines random number generation used to generate random samples.

Returns:

Xcupy array of shape (n_samples, n_features): List of samples.

score(X, y=None) → float[source]#

Compute the total log-likelihood under the model.

Parameters:

Xarray-like of shape (n_samples, n_features): List of n_features-dimensional data points. Each row corresponds to a single data point.
yNone: Ignored.

Returns:

logprobfloat: Total log-likelihood of the data in X. This is normalized to be a probability density, so the value will be low for high-dimensional data.

score_samples(X, *, convert_dtype=True) → CumlArray[source]#

Compute the log-likelihood of each sample under the model.

Parameters:

Xarray-like of shape (n_samples, n_features): An array of points to query. Last dimension should match dimension of training data (n_features).

Returns:

densityndarray of shape (n_samples,): Log-likelihood of each sample in X. These are normalized to be probability densities, so values will be low for high-dimensional data.

KernelDensity#

This Page