KNeighborsRegressor#
- class cuml.neighbors.KNeighborsRegressor(*, weights='uniform', verbose=False, output_type=None, **kwargs)#
K-Nearest Neighbors Regressor is an instance-based learning technique, that keeps training samples around for prediction, rather than trying to learn a generalizable set of model parameters.
The K-Nearest Neighbors Regressor will compute the average of the labels for the k closest neighbors and use it as the label.
- Parameters:
- n_neighborsint (default=5)
Default number of neighbors to query
- algorithmstring (default=’auto’)
The query algorithm to use. Valid options are:
'auto': to automatically select brute-force or random ball cover based on data shape and metric'rbc': for the random ball algorithm, which partitions the data space and uses the triangle inequality to lower the number of potential distances. Currently, this algorithm supports 2d Euclidean and Haversine.'brute': for brute-force, slow but produces exact results'ivfflat': for inverted file, divide the dataset in partitions and perform search on relevant partitions only'ivfpq': for inverted file and product quantization, same as inverted list, in addition the vectors are broken in n_features/M sub-vectors that will be encoded thanks to intermediary k-means clusterings. This encoding provide partial information allowing faster distances calculations
- metricstring (default=’euclidean’).
Distance metric to use.
- weights{‘uniform’, ‘distance’} or callable, default=’uniform’
Weight function used in prediction. Possible values:
‘uniform’ : uniform weights. All points in each neighborhood are weighted equally.
‘distance’ : weight points by the inverse of their distance. In this case, closer neighbors of a query point will have a greater influence than neighbors which are further away.
[callable] : a user-defined function which accepts an array of distances, and returns an array of the same shape containing the weights.
- verboseint or boolean, default=False
Sets logging level. It must be one of
cuml.common.logger.level_*. See Verbosity Levels for more info.- output_type{‘input’, ‘array’, ‘dataframe’, ‘series’, ‘df_obj’, ‘numba’, ‘cupy’, ‘numpy’, ‘cudf’, ‘pandas’}, default=None
Return results and set estimator attributes to the indicated output type. If None, the output type set at the module level (
cuml.global_settings.output_type) will be used. See Output Data Type Configuration for more info.
Methods
fit(self, X, y, *[, convert_dtype])Fit a GPU index for k-nearest neighbors regression model.
predict(self, X, *[, convert_dtype])Use the trained k-nearest neighbors regression model to predict the labels for X
Notes
For additional docs, see scikitlearn’s KNeighborsClassifier.
Examples
>>> from cuml.neighbors import KNeighborsRegressor >>> from cuml.datasets import make_regression >>> from cuml.model_selection import train_test_split >>> X, y = make_regression(n_samples=100, n_features=10, ... random_state=5) >>> X_train, X_test, y_train, y_test = train_test_split( ... X, y, train_size=0.80, random_state=5) >>> knn = KNeighborsRegressor(n_neighbors=10) >>> knn.fit(X_train, y_train) KNeighborsRegressor() >>> knn.predict(X_test) array([ 14.770798 , 51.8834 , 66.15657 , 46.978275 , 21.589611 , -14.519918 , -60.25534 , -20.856869 , 29.869623 , -34.83317 , 0.45447388, 120.39675 , 109.94834 , 63.57794 , -17.956171 , 78.77663 , 30.412262 , 32.575233 , 74.72834 , 122.276855 ], dtype=float32)
- fit(self, X, y, *, convert_dtype=True) 'KNeighborsRegressor'[source]#
Fit a GPU index for k-nearest neighbors regression model.
- Parameters:
- Xarray-like (device or host) shape = (n_samples, n_features)
Dense matrix. If datatype is other than floats or doubles, then the data will be converted to float which increases memory utilization. Set the parameter convert_dtype to False to avoid this, then the method will throw an error instead. Acceptable formats: CUDA array interface compliant objects like CuPy, cuDF DataFrame/Series, NumPy ndarray and Pandas DataFrame/Series.
- yarray-like (device or host) shape = (n_samples, 1)
Dense matrix. If datatype is other than floats or doubles, then the data will be converted to float which increases memory utilization. Set the parameter convert_dtype to False to avoid this, then the method will throw an error instead. Acceptable formats: CUDA array interface compliant objects like CuPy, cuDF DataFrame/Series, NumPy ndarray and Pandas DataFrame/Series.
- convert_dtypebool, optional (default = True)
When set to True, the method will automatically convert the inputs to np.float32.
- predict(self, X, *, convert_dtype=True) CumlArray[source]#
Use the trained k-nearest neighbors regression model to predict the labels for X
- Parameters:
- Xarray-like (device or host) shape = (n_samples, n_features)
Dense matrix. If datatype is other than floats or doubles, then the data will be converted to float which increases memory utilization. Set the parameter convert_dtype to False to avoid this, then the method will throw an error instead. Acceptable formats: CUDA array interface compliant objects like CuPy, cuDF DataFrame/Series, NumPy ndarray and Pandas DataFrame/Series.
- convert_dtypebool, optional (default = True)
When set to True, the method will automatically convert the inputs to np.float32.
- Returns:
- X_newcuDF, CuPy or NumPy object depending on cuML’s output type configuration, shape = (n_samples, n_features)
Predicted values
For more information on how to configure cuML’s output type, refer to: Output Data Type Configuration.