LabelEncoder#

class cuml.preprocessing.LabelEncoder(*, handle_unknown='error', verbose=False, output_type=None)[source]#

An nvcategory based implementation of ordinal label encoding

Parameters:

handle_unknown{‘error’, ‘ignore’}, default=’error’: Whether to raise an error or ignore if an unknown categorical feature is present during transform (default is to raise). When this parameter is set to ‘ignore’ and an unknown category is encountered during transform or inverse transform, the resulting encoding will be null.
verboseint or boolean, default=False: Sets logging level. It must be one of cuml.common.logger.level_*. See Verbosity Levels for more info.
output_type{‘input’, ‘array’, ‘dataframe’, ‘series’, ‘df_obj’, ‘numba’, ‘cupy’, ‘numpy’, ‘cudf’, ‘pandas’}, default=None: Return results and set estimator attributes to the indicated output type. If None, the output type set at the module level (cuml.global_settings.output_type) will be used. See Output Data Type Configuration for more info.

Methods

`fit`(y)	Fit a LabelEncoder instance to a set of categories.
`fit_transform`(y)	Simultaneously fit and transform an input
`inverse_transform`(y)	Revert ordinal label to original label
`transform`(y)	Transform an input into its categorical keys.

Examples

Converting a categorical implementation to a numerical one

>>> from cudf import DataFrame, Series
>>> from cuml.preprocessing import LabelEncoder
>>> data = DataFrame({'category': ['a', 'b', 'c', 'd']})

>>> # There are two functionally equivalent ways to do this
>>> le = LabelEncoder()
>>> le.fit(data.category)  # le = le.fit(data.category) also works
LabelEncoder()
>>> encoded = le.transform(data.category)

>>> print(encoded)
0    0
1    1
2    2
3    3
dtype: uint8

>>> # This method is preferred
>>> le = LabelEncoder()
>>> encoded = le.fit_transform(data.category)

>>> print(encoded)
0    0
1    1
2    2
3    3
dtype: uint8

>>> # We can assign this to a new column
>>> data = data.assign(encoded=encoded)
>>> print(data.head())
category  encoded
0         a        0
1         b        1
2         c        2
3         d        3

>>> # We can also encode more data
>>> test_data = Series(['c', 'a'])
>>> encoded = le.transform(test_data)
>>> print(encoded)
0    2
1    0
dtype: uint8

>>> # After train, ordinal label can be inverse_transform() back to
>>> # string labels
>>> ord_label = cudf.Series([0, 0, 1, 2, 1])
>>> str_label = le.inverse_transform(ord_label)
>>> print(str_label)
0    a
1    a
2    b
3    c
4    b
dtype: object

fit(y)[source]#

Fit a LabelEncoder instance to a set of categories.

Parameters:

ycudf.Series, pandas.Series, cupy.ndarray or numpy.ndarray: The target values to encode.

Returns:

selfLabelEncoder

fit_transform(y)[source]#

Simultaneously fit and transform an input

This is functionally equivalent to (but faster than) LabelEncoder().fit(y).transform(y)

inverse_transform(y: Series)[source]#

Revert ordinal label to original label

Parameters:

ycudf.Series, pandas.Series, cupy.ndarray or numpy.ndarray: dtype=int32 Ordinal labels to be reverted

Returns:

revertedthe same type as y: Reverted labels

transform(y)[source]#

Transform an input into its categorical keys.

This is intended for use with small inputs relative to the size of the dataset. For fitting and transforming an entire dataset, prefer fit_transform.

Parameters:

ycudf.Series, pandas.Series, cupy.ndarray or numpy.ndarray: Input keys to be transformed. Its values should match the categories given to fit

Returns:

encodedcudf.Series: The ordinally encoded input series

Raises:

KeyError: if a category appears that was not seen in fit

LabelEncoder#

This Page