Public Member Functions | List of all members
cudf::distinct_hash_join Class Reference

Distinct hash join that builds a hash table with the right table on construction and probes results in subsequent *_join member functions. More...

#include <distinct_hash_join.hpp>

Public Member Functions

 distinct_hash_join (distinct_hash_join const &)=delete
 
 distinct_hash_join (distinct_hash_join &&)=delete
 
distinct_hash_joinoperator= (distinct_hash_join const &)=delete
 
distinct_hash_joinoperator= (distinct_hash_join &&)=delete
 
 distinct_hash_join (cudf::table_view const &right, null_equality compare_nulls=null_equality::EQUAL, double load_factor=0.5, rmm::cuda_stream_view stream=cudf::get_default_stream())
 Constructs a distinct hash join object for subsequent probe calls. More...
 
std::pair< std::unique_ptr< rmm::device_uvector< size_type > >, std::unique_ptr< rmm::device_uvector< size_type > > > inner_join (cudf::table_view const &left, rmm::cuda_stream_view stream=cudf::get_default_stream(), rmm::device_async_resource_ref mr=cudf::get_current_device_resource_ref()) const
 Returns the row indices that can be used to construct the result of performing an inner join between two tables. More...
 
std::unique_ptr< rmm::device_uvector< size_type > > left_join (cudf::table_view const &left, rmm::cuda_stream_view stream=cudf::get_default_stream(), rmm::device_async_resource_ref mr=cudf::get_current_device_resource_ref()) const
 Returns the right table indices that can be used to construct the result of performing a left join between two tables. More...
 

Detailed Description

Distinct hash join that builds a hash table with the right table on construction and probes results in subsequent *_join member functions.

This class enables the distinct hash join scheme that builds with the right table once and probes with many left tables (possibly in parallel).

Note
Behavior is undefined if the right table contains duplicates.
All NaNs are considered as equal

Definition at line 45 of file distinct_hash_join.hpp.

Constructor & Destructor Documentation

◆ distinct_hash_join()

cudf::distinct_hash_join::distinct_hash_join ( cudf::table_view const &  right,
null_equality  compare_nulls = null_equality::EQUAL,
double  load_factor = 0.5,
rmm::cuda_stream_view  stream = cudf::get_default_stream() 
)

Constructs a distinct hash join object for subsequent probe calls.

Exceptions
cudf::logic_errorif the right table has no columns
std::invalid_argumentif load_factor is not greater than 0 and less than or equal to 1
Parameters
rightThe right table that contains distinct elements
compare_nullsControls whether null join-key values should match or not
load_factorThe desired ratio of filled slots to total slots in the hash table, must be in range (0,1]. For example, 0.5 indicates a target of 50% occupancy. Note that the actual occupancy achieved may be slightly lower than the specified value.
streamCUDA stream used for device memory operations and kernel launches

Member Function Documentation

◆ inner_join()

std::pair<std::unique_ptr<rmm::device_uvector<size_type> >, std::unique_ptr<rmm::device_uvector<size_type> > > cudf::distinct_hash_join::inner_join ( cudf::table_view const &  left,
rmm::cuda_stream_view  stream = cudf::get_default_stream(),
rmm::device_async_resource_ref  mr = cudf::get_current_device_resource_ref() 
) const

Returns the row indices that can be used to construct the result of performing an inner join between two tables.

See also
cudf::inner_join().
Parameters
leftThe left table, from which the keys are probed
streamCUDA stream used for device memory operations and kernel launches
mrDevice memory resource used to allocate the returned indices' device memory.
Returns
A pair of columns [left_indices, right_indices] that can be used to construct the result of performing an inner join between two tables with left and right as the join keys.

◆ left_join()

std::unique_ptr<rmm::device_uvector<size_type> > cudf::distinct_hash_join::left_join ( cudf::table_view const &  left,
rmm::cuda_stream_view  stream = cudf::get_default_stream(),
rmm::device_async_resource_ref  mr = cudf::get_current_device_resource_ref() 
) const

Returns the right table indices that can be used to construct the result of performing a left join between two tables.

Note
For a given row index i of the left table, the resulting right_indices[i] contains the row index of the matched row from the right table if there is a match. Otherwise, contains JoinNoMatch.
Parameters
leftThe left table, from which the keys are probed
streamCUDA stream used for device memory operations and kernel launches
mrDevice memory resource used to allocate the returned table and columns' device memory.
Returns
A right_indices column that can be used to construct the result of performing a left join between two tables with left and right as the join keys.

The documentation for this class was generated from the following file: