LabelEncoder#

class cuml.preprocessing.LabelEncoder(*, handle_unknown='error', verbose=False, output_type=None)[source]#

An nvcategory based implementation of ordinal label encoding

Parameters:
handle_unknown{‘error’, ‘ignore’}, default=’error’

Whether to raise an error or ignore if an unknown categorical feature is present during transform (default is to raise). When this parameter is set to ‘ignore’ and an unknown category is encountered during transform or inverse transform, the resulting encoding will be null.

verboseint or boolean, default=False

Sets logging level. It must be one of cuml.common.logger.level_*. See Verbosity Levels for more info.

output_type{‘input’, ‘array’, ‘dataframe’, ‘series’, ‘df_obj’, ‘numba’, ‘cupy’, ‘numpy’, ‘cudf’, ‘pandas’}, default=None

Return results and set estimator attributes to the indicated output type. If None, the output type set at the module level (cuml.global_settings.output_type) will be used. See Output Data Type Configuration for more info.

Methods

fit(y)

Fit a LabelEncoder instance to a set of categories.

fit_transform(y)

Simultaneously fit and transform an input

inverse_transform(y)

Revert ordinal label to original label

transform(y)

Transform an input into its categorical keys.

Examples

Converting a categorical implementation to a numerical one

>>> from cudf import DataFrame, Series
>>> from cuml.preprocessing import LabelEncoder
>>> data = DataFrame({'category': ['a', 'b', 'c', 'd']})
>>> # There are two functionally equivalent ways to do this
>>> le = LabelEncoder()
>>> le.fit(data.category)  # le = le.fit(data.category) also works
LabelEncoder()
>>> encoded = le.transform(data.category)
>>> print(encoded)
0    0
1    1
2    2
3    3
dtype: uint8
>>> # This method is preferred
>>> le = LabelEncoder()
>>> encoded = le.fit_transform(data.category)
>>> print(encoded)
0    0
1    1
2    2
3    3
dtype: uint8
>>> # We can assign this to a new column
>>> data = data.assign(encoded=encoded)
>>> print(data.head())
category  encoded
0         a        0
1         b        1
2         c        2
3         d        3
>>> # We can also encode more data
>>> test_data = Series(['c', 'a'])
>>> encoded = le.transform(test_data)
>>> print(encoded)
0    2
1    0
dtype: uint8
>>> # After train, ordinal label can be inverse_transform() back to
>>> # string labels
>>> ord_label = cudf.Series([0, 0, 1, 2, 1])
>>> str_label = le.inverse_transform(ord_label)
>>> print(str_label)
0    a
1    a
2    b
3    c
4    b
dtype: object
fit(y)[source]#

Fit a LabelEncoder instance to a set of categories.

Parameters:
ycudf.Series, pandas.Series, cupy.ndarray or numpy.ndarray

The target values to encode.

Returns:
selfLabelEncoder
fit_transform(y)[source]#

Simultaneously fit and transform an input

This is functionally equivalent to (but faster than) LabelEncoder().fit(y).transform(y)

inverse_transform(y: Series)[source]#

Revert ordinal label to original label

Parameters:
ycudf.Series, pandas.Series, cupy.ndarray or numpy.ndarray

dtype=int32 Ordinal labels to be reverted

Returns:
revertedthe same type as y

Reverted labels

transform(y)[source]#

Transform an input into its categorical keys.

This is intended for use with small inputs relative to the size of the dataset. For fitting and transforming an entire dataset, prefer fit_transform.

Parameters:
ycudf.Series, pandas.Series, cupy.ndarray or numpy.ndarray

Input keys to be transformed. Its values should match the categories given to fit

Returns:
encodedcudf.Series

The ordinally encoded input series

Raises:
KeyError

if a category appears that was not seen in fit