Public Member Functions | List of all members
cudf::key_remapping Class Reference

Remaps keys to unique integer IDs. More...

#include <key_remapping.hpp>

Public Member Functions

 key_remapping (key_remapping const &)=delete
 
 key_remapping (key_remapping &&)=delete
 
key_remappingoperator= (key_remapping const &)=delete
 
key_remappingoperator= (key_remapping &&)=delete
 
 key_remapping (cudf::table_view const &right, null_equality compare_nulls=null_equality::EQUAL, cudf::compute_metrics metrics=cudf::compute_metrics::YES, rmm::cuda_stream_view stream=cudf::get_default_stream())
 Constructs a key remapping structure from the given right keys. More...
 
std::unique_ptr< cudf::columnremap_right_keys (rmm::cuda_stream_view stream=cudf::get_default_stream(), rmm::device_async_resource_ref mr=cudf::get_current_device_resource_ref()) const
 Remap right keys to integer IDs. More...
 
std::unique_ptr< cudf::columnremap_build_keys (rmm::cuda_stream_view stream=cudf::get_default_stream(), rmm::device_async_resource_ref mr=cudf::get_current_device_resource_ref()) const
 Deprecated alias for remap_right_keys(). More...
 
std::unique_ptr< cudf::columnremap_left_keys (cudf::table_view const &keys, rmm::cuda_stream_view stream=cudf::get_default_stream(), rmm::device_async_resource_ref mr=cudf::get_current_device_resource_ref()) const
 Remap left keys to integer IDs. More...
 
std::unique_ptr< cudf::columnremap_probe_keys (cudf::table_view const &keys, rmm::cuda_stream_view stream=cudf::get_default_stream(), rmm::device_async_resource_ref mr=cudf::get_current_device_resource_ref()) const
 Deprecated alias for remap_left_keys(). More...
 
bool has_metrics () const
 Check if metrics (distinct_count, max_duplicate_count) were computed. More...
 
size_type get_distinct_count () const
 Get the number of distinct keys in the right table. More...
 
size_type get_max_duplicate_count () const
 Get the maximum number of times any single key appears. More...
 

Detailed Description

Remaps keys to unique integer IDs.

Each distinct key in the right table is assigned a unique non-negative integer ID. Rows with equal keys will map to the same ID. Keys that cannot be mapped (e.g., not found in the left table, or null keys when nulls are unequal) receive negative sentinel values. The specific ID values are stable for the lifetime of this object but are otherwise unspecified.

Note
The right table is the build side: the internal hash table is built from its keys, and keys passed to remap_left_keys() form the probe side matched against it.
The right table must remain valid for the lifetime of this object, as the hash table references it directly without copying.
All NaNs are considered equal

Definition at line 79 of file key_remapping.hpp.

Constructor & Destructor Documentation

◆ key_remapping()

cudf::key_remapping::key_remapping ( cudf::table_view const &  right,
null_equality  compare_nulls = null_equality::EQUAL,
cudf::compute_metrics  metrics = cudf::compute_metrics::YES,
rmm::cuda_stream_view  stream = cudf::get_default_stream() 
)

Constructs a key remapping structure from the given right keys.

Exceptions
cudf::logic_errorif the right table has no columns
Parameters
rightThe right table containing the keys to remap; the internal hash table is built from this table
compare_nullsControls whether null key values should match or not. When EQUAL, null keys are treated as equal and assigned a valid non-negative ID. When UNEQUAL, rows with null keys receive a negative sentinel value.
metricsControls whether to compute distinct_count and max_duplicate_count. If YES (default), compute metrics for later retrieval via get_distinct_count() and get_max_duplicate_count(). If NO, skip metrics computation for better performance; calling get_distinct_count() or get_max_duplicate_count() will throw.
streamCUDA stream used for device memory operations and kernel launches

Member Function Documentation

◆ get_distinct_count()

size_type cudf::key_remapping::get_distinct_count ( ) const

Get the number of distinct keys in the right table.

Exceptions
cudf::logic_errorif metrics was NO during construction
Returns
The count of unique key combinations found during build

◆ get_max_duplicate_count()

size_type cudf::key_remapping::get_max_duplicate_count ( ) const

Get the maximum number of times any single key appears.

Exceptions
cudf::logic_errorif metrics was NO during construction
Returns
The maximum duplicate count across all distinct keys

◆ has_metrics()

bool cudf::key_remapping::has_metrics ( ) const

Check if metrics (distinct_count, max_duplicate_count) were computed.

Returns
true if metrics are available, false if metrics was NO during construction

◆ remap_build_keys()

std::unique_ptr<cudf::column> cudf::key_remapping::remap_build_keys ( rmm::cuda_stream_view  stream = cudf::get_default_stream(),
rmm::device_async_resource_ref  mr = cudf::get_current_device_resource_ref() 
) const
inline

Deprecated alias for remap_right_keys().

Deprecated:
Use remap_right_keys() instead.
Parameters
streamCUDA stream used for device memory operations and kernel launches
mrDevice memory resource used to allocate device memory for the returned column
Returns
A column of INT32 values with the remapped key IDs

Definition at line 139 of file key_remapping.hpp.

◆ remap_left_keys()

std::unique_ptr<cudf::column> cudf::key_remapping::remap_left_keys ( cudf::table_view const &  keys,
rmm::cuda_stream_view  stream = cudf::get_default_stream(),
rmm::device_async_resource_ref  mr = cudf::get_current_device_resource_ref() 
) const

Remap left keys to integer IDs.

For each row in the input, returns the integer ID assigned to that key. Non-negative integers represent keys found in the right table, while negative values represent keys that were not found or cannot be matched (e.g., null keys when nulls are unequal, or keys not present in the right table).

Exceptions
std::invalid_argumentif keys has different number of columns than the right table
cudf::data_type_errorif keys has different column types than the right table
Parameters
keysThe left keys to remap (must have same schema as the right table)
streamCUDA stream used for device memory operations and kernel launches
mrDevice memory resource used to allocate the returned column's device memory
Returns
A column of INT32 values with the remapped key IDs

◆ remap_probe_keys()

std::unique_ptr<cudf::column> cudf::key_remapping::remap_probe_keys ( cudf::table_view const &  keys,
rmm::cuda_stream_view  stream = cudf::get_default_stream(),
rmm::device_async_resource_ref  mr = cudf::get_current_device_resource_ref() 
) const
inline

Deprecated alias for remap_left_keys().

Deprecated:
Use remap_left_keys() instead.
Parameters
keysThe left keys to remap (must have same schema as the right table)
streamCUDA stream used for device memory operations and kernel launches
mrDevice memory resource used to allocate device memory for the returned column
Returns
A column of INT32 values with the remapped key IDs

Definition at line 180 of file key_remapping.hpp.

◆ remap_right_keys()

std::unique_ptr<cudf::column> cudf::key_remapping::remap_right_keys ( rmm::cuda_stream_view  stream = cudf::get_default_stream(),
rmm::device_async_resource_ref  mr = cudf::get_current_device_resource_ref() 
) const

Remap right keys to integer IDs.

Recomputes the remapped right table from the cached right keys. This does not cache the remapped table; each call will recompute it from the key remapping.

For each row in the cached right table, returns the integer ID assigned to that key. Non-negative integers represent valid mapped keys, while negative values represent keys that cannot be mapped (e.g., null keys when nulls are unequal).

Parameters
streamCUDA stream used for device memory operations and kernel launches
mrDevice memory resource used to allocate the returned column's device memory
Returns
A column of INT32 values with the remapped key IDs

The documentation for this class was generated from the following file: