LabelEncoder#
- class cuml.preprocessing.LabelEncoder(*, handle_unknown='error', verbose=False, output_type=None)[source]#
An nvcategory based implementation of ordinal label encoding
- Parameters:
- handle_unknown{‘error’, ‘ignore’}, default=’error’
Whether to raise an error or ignore if an unknown categorical feature is present during transform (default is to raise). When this parameter is set to ‘ignore’ and an unknown category is encountered during transform or inverse transform, the resulting encoding will be null.
- verboseint or boolean, default=False
Sets logging level. It must be one of
cuml.common.logger.level_*. See Verbosity Levels for more info.- output_type{‘input’, ‘array’, ‘dataframe’, ‘series’, ‘df_obj’, ‘numba’, ‘cupy’, ‘numpy’, ‘cudf’, ‘pandas’}, default=None
Return results and set estimator attributes to the indicated output type. If None, the output type set at the module level (
cuml.global_settings.output_type) will be used. See Output Data Type Configuration for more info.
Methods
fit(y)Fit a LabelEncoder instance to a set of categories.
Simultaneously fit and transform an input
Revert ordinal label to original label
transform(y)Transform an input into its categorical keys.
Examples
Converting a categorical implementation to a numerical one
>>> from cudf import DataFrame, Series >>> from cuml.preprocessing import LabelEncoder >>> data = DataFrame({'category': ['a', 'b', 'c', 'd']})
>>> # There are two functionally equivalent ways to do this >>> le = LabelEncoder() >>> le.fit(data.category) # le = le.fit(data.category) also works LabelEncoder() >>> encoded = le.transform(data.category)
>>> print(encoded) 0 0 1 1 2 2 3 3 dtype: uint8
>>> # This method is preferred >>> le = LabelEncoder() >>> encoded = le.fit_transform(data.category)
>>> print(encoded) 0 0 1 1 2 2 3 3 dtype: uint8
>>> # We can assign this to a new column >>> data = data.assign(encoded=encoded) >>> print(data.head()) category encoded 0 a 0 1 b 1 2 c 2 3 d 3
>>> # We can also encode more data >>> test_data = Series(['c', 'a']) >>> encoded = le.transform(test_data) >>> print(encoded) 0 2 1 0 dtype: uint8
>>> # After train, ordinal label can be inverse_transform() back to >>> # string labels >>> ord_label = cudf.Series([0, 0, 1, 2, 1]) >>> str_label = le.inverse_transform(ord_label) >>> print(str_label) 0 a 1 a 2 b 3 c 4 b dtype: object
- fit(y)[source]#
Fit a LabelEncoder instance to a set of categories.
- Parameters:
- ycudf.Series, pandas.Series, cupy.ndarray or numpy.ndarray
The target values to encode.
- Returns:
- selfLabelEncoder
- fit_transform(y)[source]#
Simultaneously fit and transform an input
This is functionally equivalent to (but faster than)
LabelEncoder().fit(y).transform(y)
- inverse_transform(y: Series)[source]#
Revert ordinal label to original label
- Parameters:
- ycudf.Series, pandas.Series, cupy.ndarray or numpy.ndarray
dtype=int32 Ordinal labels to be reverted
- Returns:
- revertedthe same type as y
Reverted labels
- transform(y)[source]#
Transform an input into its categorical keys.
This is intended for use with small inputs relative to the size of the dataset. For fitting and transforming an entire dataset, prefer
fit_transform.