KNeighborsRegressor#

class cuml.neighbors.KNeighborsRegressor(*, weights='uniform', verbose=False, output_type=None, **kwargs)#

K-Nearest Neighbors Regressor is an instance-based learning technique, that keeps training samples around for prediction, rather than trying to learn a generalizable set of model parameters.

The K-Nearest Neighbors Regressor will compute the average of the labels for the k closest neighbors and use it as the label.

Parameters:

n_neighborsint (default=5)

Default number of neighbors to query

algorithmstring (default=’auto’)

The query algorithm to use. Valid options are:

'auto': to automatically select brute-force or random ball cover based on data shape and metric
'rbc': for the random ball algorithm, which partitions the data space and uses the triangle inequality to lower the number of potential distances. Currently, this algorithm supports 2d Euclidean and Haversine.
'brute': for brute-force, slow but produces exact results
'ivfflat': for inverted file, divide the dataset in partitions and perform search on relevant partitions only
'ivfpq': for inverted file and product quantization, same as inverted list, in addition the vectors are broken in n_features/M sub-vectors that will be encoded thanks to intermediary k-means clusterings. This encoding provide partial information allowing faster distances calculations

metricstring (default=’euclidean’).

Distance metric to use.

weights{‘uniform’, ‘distance’} or callable, default=’uniform’

Weight function used in prediction. Possible values:

‘uniform’ : uniform weights. All points in each neighborhood are weighted equally.
‘distance’ : weight points by the inverse of their distance. In this case, closer neighbors of a query point will have a greater influence than neighbors which are further away.
[callable] : a user-defined function which accepts an array of distances, and returns an array of the same shape containing the weights.

verboseint or boolean, default=False

Sets logging level. It must be one of cuml.common.logger.level_*. See Verbosity Levels for more info.

output_type{‘input’, ‘array’, ‘dataframe’, ‘series’, ‘df_obj’, ‘numba’, ‘cupy’, ‘numpy’, ‘cudf’, ‘pandas’}, default=None

Return results and set estimator attributes to the indicated output type. If None, the output type set at the module level (cuml.global_settings.output_type) will be used. See Output Data Type Configuration for more info.

Methods

`fit`(self, X, y, *[, convert_dtype])	Fit a GPU index for k-nearest neighbors regression model.
`predict`(self, X, *[, convert_dtype])	Use the trained k-nearest neighbors regression model to predict the labels for X

Notes

For additional docs, see scikitlearn’s KNeighborsClassifier.

Examples

>>> from cuml.neighbors import KNeighborsRegressor
>>> from cuml.datasets import make_regression
>>> from cuml.model_selection import train_test_split

>>> X, y = make_regression(n_samples=100, n_features=10,
...                        random_state=5)
>>> X_train, X_test, y_train, y_test = train_test_split(
...   X, y, train_size=0.80, random_state=5)

>>> knn = KNeighborsRegressor(n_neighbors=10)
>>> knn.fit(X_train, y_train)
KNeighborsRegressor()
>>> knn.predict(X_test)
array([ 14.770798  ,  51.8834    ,  66.15657   ,  46.978275  ,
    21.589611  , -14.519918  , -60.25534   , -20.856869  ,
    29.869623  , -34.83317   ,   0.45447388, 120.39675   ,
    109.94834   ,  63.57794   , -17.956171  ,  78.77663   ,
    30.412262  ,  32.575233  ,  74.72834   , 122.276855  ],
dtype=float32)

fit(self, X, y, *, convert_dtype=True) → 'KNeighborsRegressor'[source]#

Fit a GPU index for k-nearest neighbors regression model.

Parameters:

Xarray-like (device or host) shape = (n_samples, n_features): Dense matrix. If datatype is other than floats or doubles, then the data will be converted to float which increases memory utilization. Set the parameter convert_dtype to False to avoid this, then the method will throw an error instead. Acceptable formats: CUDA array interface compliant objects like CuPy, cuDF DataFrame/Series, NumPy ndarray and Pandas DataFrame/Series.
yarray-like (device or host) shape = (n_samples, 1): Dense matrix. If datatype is other than floats or doubles, then the data will be converted to float which increases memory utilization. Set the parameter convert_dtype to False to avoid this, then the method will throw an error instead. Acceptable formats: CUDA array interface compliant objects like CuPy, cuDF DataFrame/Series, NumPy ndarray and Pandas DataFrame/Series.
convert_dtypebool, optional (default = True): When set to True, the method will automatically convert the inputs to np.float32.

predict(self, X, *, convert_dtype=True) → CumlArray[source]#

Use the trained k-nearest neighbors regression model to predict the labels for X

Parameters:

Xarray-like (device or host) shape = (n_samples, n_features): Dense matrix. If datatype is other than floats or doubles, then the data will be converted to float which increases memory utilization. Set the parameter convert_dtype to False to avoid this, then the method will throw an error instead. Acceptable formats: CUDA array interface compliant objects like CuPy, cuDF DataFrame/Series, NumPy ndarray and Pandas DataFrame/Series.
convert_dtypebool, optional (default = True): When set to True, the method will automatically convert the inputs to np.float32.

Returns:

X_newcuDF, CuPy or NumPy object depending on cuML’s output type configuration, shape = (n_samples, n_features)

Predicted values

For more information on how to configure cuML’s output type, refer to: Output Data Type Configuration.

KNeighborsRegressor#

This Page