Mark-based hash join for semi/anti join with left table reuse. More...
#include <mark_join.hpp>
Public Member Functions | |
| mark_join (mark_join const &)=delete | |
| mark_join (mark_join &&)=delete | |
| mark_join & | operator= (mark_join const &)=delete |
| mark_join & | operator= (mark_join &&)=delete |
| mark_join (cudf::table_view const &build, cudf::null_equality compare_nulls=null_equality::EQUAL, rmm::cuda_stream_view stream=cudf::get_default_stream()) | |
| Constructs a mark join object by building a hash table from the build table. More... | |
| mark_join (cudf::table_view const &build, cudf::null_equality compare_nulls, double load_factor, rmm::cuda_stream_view stream=cudf::get_default_stream()) | |
| Constructs a mark join object with a specified load factor. More... | |
| std::unique_ptr< rmm::device_uvector< size_type > > | semi_join (cudf::table_view const &probe, rmm::cuda_stream_view stream=cudf::get_default_stream(), rmm::device_async_resource_ref mr=cudf::get_current_device_resource_ref()) const |
| Returns build row indices that have at least one match in the probe table. More... | |
| std::unique_ptr< rmm::device_uvector< size_type > > | anti_join (cudf::table_view const &probe, rmm::cuda_stream_view stream=cudf::get_default_stream(), rmm::device_async_resource_ref mr=cudf::get_current_device_resource_ref()) const |
| Returns build row indices that have no match in the probe table. More... | |
Mark-based hash join for semi/anti join with left table reuse.
Builds a hash table from the build (left) table using a multiset that allows duplicate keys. The probe kernel atomically marks matching build entries via CAS on the hash MSB, then a retrieve kernel collects marked (semi) or unmarked (anti) entries.
This class enables building the hash table once and probing multiple times with different right (probe) tables, amortizing the build cost.
For the common case where the right (filter) table is reused, use cudf::filtered_join instead, which builds a distinct set from the right table.
Definition at line 59 of file mark_join.hpp.
| cudf::mark_join::mark_join | ( | cudf::table_view const & | build, |
| cudf::null_equality | compare_nulls = null_equality::EQUAL, |
||
| rmm::cuda_stream_view | stream = cudf::get_default_stream() |
||
| ) |
Constructs a mark join object by building a hash table from the build table.
| build | The build table (typically the left table) |
| compare_nulls | Controls whether null join-key values should match or not |
| stream | CUDA stream used for device memory operations and kernel launches |
| cudf::mark_join::mark_join | ( | cudf::table_view const & | build, |
| cudf::null_equality | compare_nulls, | ||
| double | load_factor, | ||
| rmm::cuda_stream_view | stream = cudf::get_default_stream() |
||
| ) |
Constructs a mark join object with a specified load factor.
| build | The build table (typically the left table) |
| compare_nulls | Controls whether null join-key values should match or not |
| load_factor | Hash table load factor in range (0,1] |
| stream | CUDA stream used for device memory operations and kernel launches |
| std::unique_ptr<rmm::device_uvector<size_type> > cudf::mark_join::anti_join | ( | cudf::table_view const & | probe, |
| rmm::cuda_stream_view | stream = cudf::get_default_stream(), |
||
| rmm::device_async_resource_ref | mr = cudf::get_current_device_resource_ref() |
||
| ) | const |
Returns build row indices that have no match in the probe table.
| probe | The probe table |
| stream | CUDA stream used for device memory operations and kernel launches |
| mr | Device memory resource used to allocate the returned device memory |
| std::unique_ptr<rmm::device_uvector<size_type> > cudf::mark_join::semi_join | ( | cudf::table_view const & | probe, |
| rmm::cuda_stream_view | stream = cudf::get_default_stream(), |
||
| rmm::device_async_resource_ref | mr = cudf::get_current_device_resource_ref() |
||
| ) | const |
Returns build row indices that have at least one match in the probe table.
| probe | The probe table |
| stream | CUDA stream used for device memory operations and kernel launches |
| mr | Device memory resource used to allocate the returned device memory |