Remaps keys to unique integer IDs. More...
#include <key_remapping.hpp>
Remaps keys to unique integer IDs.
Each distinct key in the right table is assigned a unique non-negative integer ID. Rows with equal keys will map to the same ID. Keys that cannot be mapped (e.g., not found in the left table, or null keys when nulls are unequal) receive negative sentinel values. The specific ID values are stable for the lifetime of this object but are otherwise unspecified.
Definition at line 79 of file key_remapping.hpp.
| cudf::key_remapping::key_remapping | ( | cudf::table_view const & | right, |
| null_equality | compare_nulls = null_equality::EQUAL, |
||
| cudf::compute_metrics | metrics = cudf::compute_metrics::YES, |
||
| rmm::cuda_stream_view | stream = cudf::get_default_stream() |
||
| ) |
Constructs a key remapping structure from the given right keys.
| cudf::logic_error | if the right table has no columns |
| right | The right table containing the keys to remap; the internal hash table is built from this table |
| compare_nulls | Controls whether null key values should match or not. When EQUAL, null keys are treated as equal and assigned a valid non-negative ID. When UNEQUAL, rows with null keys receive a negative sentinel value. |
| metrics | Controls whether to compute distinct_count and max_duplicate_count. If YES (default), compute metrics for later retrieval via get_distinct_count() and get_max_duplicate_count(). If NO, skip metrics computation for better performance; calling get_distinct_count() or get_max_duplicate_count() will throw. |
| stream | CUDA stream used for device memory operations and kernel launches |
| size_type cudf::key_remapping::get_distinct_count | ( | ) | const |
Get the number of distinct keys in the right table.
| cudf::logic_error | if metrics was NO during construction |
| size_type cudf::key_remapping::get_max_duplicate_count | ( | ) | const |
Get the maximum number of times any single key appears.
| cudf::logic_error | if metrics was NO during construction |
| bool cudf::key_remapping::has_metrics | ( | ) | const |
Check if metrics (distinct_count, max_duplicate_count) were computed.
|
inline |
Deprecated alias for remap_right_keys().
remap_right_keys() instead.| stream | CUDA stream used for device memory operations and kernel launches |
| mr | Device memory resource used to allocate device memory for the returned column |
Definition at line 139 of file key_remapping.hpp.
| std::unique_ptr<cudf::column> cudf::key_remapping::remap_left_keys | ( | cudf::table_view const & | keys, |
| rmm::cuda_stream_view | stream = cudf::get_default_stream(), |
||
| rmm::device_async_resource_ref | mr = cudf::get_current_device_resource_ref() |
||
| ) | const |
Remap left keys to integer IDs.
For each row in the input, returns the integer ID assigned to that key. Non-negative integers represent keys found in the right table, while negative values represent keys that were not found or cannot be matched (e.g., null keys when nulls are unequal, or keys not present in the right table).
| std::invalid_argument | if keys has different number of columns than the right table |
| cudf::data_type_error | if keys has different column types than the right table |
| keys | The left keys to remap (must have same schema as the right table) |
| stream | CUDA stream used for device memory operations and kernel launches |
| mr | Device memory resource used to allocate the returned column's device memory |
|
inline |
Deprecated alias for remap_left_keys().
remap_left_keys() instead.| keys | The left keys to remap (must have same schema as the right table) |
| stream | CUDA stream used for device memory operations and kernel launches |
| mr | Device memory resource used to allocate device memory for the returned column |
Definition at line 180 of file key_remapping.hpp.
| std::unique_ptr<cudf::column> cudf::key_remapping::remap_right_keys | ( | rmm::cuda_stream_view | stream = cudf::get_default_stream(), |
| rmm::device_async_resource_ref | mr = cudf::get_current_device_resource_ref() |
||
| ) | const |
Remap right keys to integer IDs.
Recomputes the remapped right table from the cached right keys. This does not cache the remapped table; each call will recompute it from the key remapping.
For each row in the cached right table, returns the integer ID assigned to that key. Non-negative integers represent valid mapped keys, while negative values represent keys that cannot be mapped (e.g., null keys when nulls are unequal).
| stream | CUDA stream used for device memory operations and kernel launches |
| mr | Device memory resource used to allocate the returned column's device memory |