KNeighborsRegressor#

class cuml.neighbors.KNeighborsRegressor(*, weights='uniform', verbose=False, output_type=None, **kwargs)#

K-Nearest Neighbors Regressor is an instance-based learning technique, that keeps training samples around for prediction, rather than trying to learn a generalizable set of model parameters.

The K-Nearest Neighbors Regressor will compute the average of the labels for the k closest neighbors and use it as the label.

Parameters:
n_neighborsint (default=5)

Default number of neighbors to query

algorithmstring (default=’auto’)

The query algorithm to use. Valid options are:

  • 'auto': to automatically select brute-force or random ball cover based on data shape and metric

  • 'rbc': for the random ball algorithm, which partitions the data space and uses the triangle inequality to lower the number of potential distances. Currently, this algorithm supports 2d Euclidean and Haversine.

  • 'brute': for brute-force, slow but produces exact results

  • 'ivfflat': for inverted file, divide the dataset in partitions and perform search on relevant partitions only

  • 'ivfpq': for inverted file and product quantization, same as inverted list, in addition the vectors are broken in n_features/M sub-vectors that will be encoded thanks to intermediary k-means clusterings. This encoding provide partial information allowing faster distances calculations

metricstring (default=’euclidean’).

Distance metric to use.

weights{‘uniform’, ‘distance’} or callable, default=’uniform’

Weight function used in prediction. Possible values:

  • ‘uniform’ : uniform weights. All points in each neighborhood are weighted equally.

  • ‘distance’ : weight points by the inverse of their distance. In this case, closer neighbors of a query point will have a greater influence than neighbors which are further away.

  • [callable] : a user-defined function which accepts an array of distances, and returns an array of the same shape containing the weights.

verboseint or boolean, default=False

Sets logging level. It must be one of cuml.common.logger.level_*. See Verbosity Levels for more info.

output_type{‘input’, ‘array’, ‘dataframe’, ‘series’, ‘df_obj’, ‘numba’, ‘cupy’, ‘numpy’, ‘cudf’, ‘pandas’}, default=None

Return results and set estimator attributes to the indicated output type. If None, the output type set at the module level (cuml.global_settings.output_type) will be used. See Output Data Type Configuration for more info.

Methods

fit(self, X, y, *[, convert_dtype])

Fit a GPU index for k-nearest neighbors regression model.

predict(self, X, *[, convert_dtype])

Use the trained k-nearest neighbors regression model to predict the labels for X

Notes

For additional docs, see scikitlearn’s KNeighborsClassifier.

Examples

>>> from cuml.neighbors import KNeighborsRegressor
>>> from cuml.datasets import make_regression
>>> from cuml.model_selection import train_test_split

>>> X, y = make_regression(n_samples=100, n_features=10,
...                        random_state=5)
>>> X_train, X_test, y_train, y_test = train_test_split(
...   X, y, train_size=0.80, random_state=5)

>>> knn = KNeighborsRegressor(n_neighbors=10)
>>> knn.fit(X_train, y_train)
KNeighborsRegressor()
>>> knn.predict(X_test)
array([ 14.770798  ,  51.8834    ,  66.15657   ,  46.978275  ,
    21.589611  , -14.519918  , -60.25534   , -20.856869  ,
    29.869623  , -34.83317   ,   0.45447388, 120.39675   ,
    109.94834   ,  63.57794   , -17.956171  ,  78.77663   ,
    30.412262  ,  32.575233  ,  74.72834   , 122.276855  ],
dtype=float32)
fit(self, X, y, *, convert_dtype=True) 'KNeighborsRegressor'[source]#

Fit a GPU index for k-nearest neighbors regression model.

Parameters:
Xarray-like (device or host) shape = (n_samples, n_features)

Dense matrix. If datatype is other than floats or doubles, then the data will be converted to float which increases memory utilization. Set the parameter convert_dtype to False to avoid this, then the method will throw an error instead. Acceptable formats: CUDA array interface compliant objects like CuPy, cuDF DataFrame/Series, NumPy ndarray and Pandas DataFrame/Series.

yarray-like (device or host) shape = (n_samples, 1)

Dense matrix. If datatype is other than floats or doubles, then the data will be converted to float which increases memory utilization. Set the parameter convert_dtype to False to avoid this, then the method will throw an error instead. Acceptable formats: CUDA array interface compliant objects like CuPy, cuDF DataFrame/Series, NumPy ndarray and Pandas DataFrame/Series.

convert_dtypebool, optional (default = True)

When set to True, the method will automatically convert the inputs to np.float32.

predict(self, X, *, convert_dtype=True) CumlArray[source]#

Use the trained k-nearest neighbors regression model to predict the labels for X

Parameters:
Xarray-like (device or host) shape = (n_samples, n_features)

Dense matrix. If datatype is other than floats or doubles, then the data will be converted to float which increases memory utilization. Set the parameter convert_dtype to False to avoid this, then the method will throw an error instead. Acceptable formats: CUDA array interface compliant objects like CuPy, cuDF DataFrame/Series, NumPy ndarray and Pandas DataFrame/Series.

convert_dtypebool, optional (default = True)

When set to True, the method will automatically convert the inputs to np.float32.

Returns:
X_newcuDF, CuPy or NumPy object depending on cuML’s output type configuration, shape = (n_samples, n_features)

Predicted values

For more information on how to configure cuML’s output type, refer to: Output Data Type Configuration.