KernelDensity#
- class cuml.neighbors.KernelDensity(*, bandwidth=1.0, kernel='gaussian', metric='euclidean', metric_params=None, output_type=None, verbose=False)[source]#
Kernel Density Estimation. Computes a non-parametric density estimate from a finite data sample, smoothing the estimate according to a bandwidth parameter.
- Parameters:
- bandwidthfloat or {“scott”, “silverman”}, default=1.0
The bandwidth of the kernel.
- kernel{‘gaussian’, ‘tophat’, ‘epanechnikov’, ‘exponential’, ‘linear’, ‘cosine’}, default=’gaussian’
The kernel to use.
- metricstr, default=’euclidean’
The distance metric to use. Note that not all metrics are valid with all algorithms. Note that the normalization of the density output is correct only for the Euclidean distance metric. Default is ‘euclidean’.
- metric_paramsdict, default=None
Additional parameters to be passed to the tree for use with the metric.
- output_type{‘input’, ‘array’, ‘dataframe’, ‘series’, ‘df_obj’, ‘numba’, ‘cupy’, ‘numpy’, ‘cudf’, ‘pandas’}, default=None
Return results and set estimator attributes to the indicated output type. If None, the output type set at the module level (
cuml.global_settings.output_type) will be used. See Output Data Type Configuration for more info.- verboseint or boolean, default=False
Sets logging level. It must be one of
cuml.common.logger.level_*. See Verbosity Levels for more info.
- Attributes:
- n_features_in_int
Number of features seen during fit.
- bandwidth_float
Value of the bandwidth used, either given directly via
bandwidthor estimated withbandwidth="scott"orbandwidth="silverman".
Methods
fit(X[, y, sample_weight, convert_dtype])Fit the Kernel Density model on the data.
sample([n_samples, random_state])Generate random samples from the model.
score(X[, y])Compute the total log-likelihood under the model.
score_samples(X, *[, convert_dtype])Compute the log-likelihood of each sample under the model.
Examples
>>> from cuml.neighbors import KernelDensity >>> import cupy as cp >>> rng = cp.random.RandomState(42) >>> X = rng.random_sample((100, 3)) >>> kde = KernelDensity(kernel='gaussian', bandwidth=0.5).fit(X) >>> log_density = kde.score_samples(X[:3])
- fit(X, y=None, sample_weight=None, *, convert_dtype=True) KernelDensity[source]#
Fit the Kernel Density model on the data.
- Parameters:
- Xarray-like of shape (n_samples, n_features)
List of n_features-dimensional data points. Each row corresponds to a single data point.
- yNone
Ignored.
- sample_weightarray-like of shape (n_samples,), default=None
List of sample weights attached to the data X.
- Returns:
- self
Returns the instance itself.
- sample(n_samples=1, random_state=None) CumlArray[source]#
Generate random samples from the model.
Currently, this is implemented only for gaussian and tophat kernels.
- Parameters:
- n_samplesint, default=1
Number of samples to generate.
- random_stateint, RandomState instance or None, default=None
Determines random number generation used to generate random samples.
- Returns:
- Xcupy array of shape (n_samples, n_features)
List of samples.
- score(X, y=None) float[source]#
Compute the total log-likelihood under the model.
- Parameters:
- Xarray-like of shape (n_samples, n_features)
List of n_features-dimensional data points. Each row corresponds to a single data point.
- yNone
Ignored.
- Returns:
- logprobfloat
Total log-likelihood of the data in X. This is normalized to be a probability density, so the value will be low for high-dimensional data.
- score_samples(X, *, convert_dtype=True) CumlArray[source]#
Compute the log-likelihood of each sample under the model.
- Parameters:
- Xarray-like of shape (n_samples, n_features)
An array of points to query. Last dimension should match dimension of training data (n_features).
- Returns:
- densityndarray of shape (n_samples,)
Log-likelihood of each sample in
X. These are normalized to be probability densities, so values will be low for high-dimensional data.