Builds parquet_reader_options to use for read_parquet(). More...

#include <parquet.hpp>

Public Member Functions
	parquet_reader_options_builder ()=default
	Default constructor. More...

	parquet_reader_options_builder (source_info src)
	Constructor from source info. More...

parquet_reader_options_builder &	columns (std::vector< std::string > column_names)
	Sets names of the columns to be read. More...

parquet_reader_options_builder &	column_names (std::vector< std::string > column_names)
	Sets names of the columns to be read. More...

parquet_reader_options_builder &	column_indices (std::vector< cudf::size_type > col_indices)
	Sets the indices of top-level columns to be read from all input sources. More...

parquet_reader_options_builder &	row_groups (std::vector< std::vector< size_type >> row_groups)
	Specifies which row groups to read from each input source. More...

parquet_reader_options_builder &	filter (ast::expression const &filter)
	Sets AST based filter for predicate pushdown. More...

parquet_reader_options_builder &	convert_strings_to_categories (bool val)
	Sets enable/disable conversion of strings to categories. More...

parquet_reader_options_builder &	use_pandas_metadata (bool val)
	Sets to enable/disable use of pandas metadata to read. More...

parquet_reader_options_builder &	use_arrow_schema (bool val)
	Sets to enable/disable use of arrow schema to read. More...

parquet_reader_options_builder &	allow_mismatched_pq_schemas (bool val)
	Sets to enable/disable reading of matching projected and filter columns from mismatched Parquet sources. More...

parquet_reader_options_builder &	ignore_missing_columns (bool val)
	Sets to enable/disable ignoring of non-existent projected columns while reading. More...

parquet_reader_options_builder &	set_column_schema (std::vector< reader_column_schema > val)
	Sets reader metadata. More...

parquet_reader_options_builder &	skip_rows (int64_t val)
	Sets number of rows to skip. More...

parquet_reader_options_builder &	num_rows (int64_t val)
	Sets number of rows to read. More...

parquet_reader_options_builder &	skip_bytes (size_t val)
	Sets bytes to skip before starting reading row groups. More...

parquet_reader_options_builder &	num_bytes (size_t val)
	Sets number of bytes after skipping to end reading row groups at. More...

parquet_reader_options_builder &	timestamp_type (data_type type)
	timestamp_type used to cast timestamp columns. More...

parquet_reader_options_builder &	decimal_width (type_id width)
	Sets the decimal width used to cast decimal columns. More...

parquet_reader_options_builder &	use_jit_filter (bool use_jit_filter)
	Enable/disable use of JIT for filter step. More...

parquet_reader_options_builder &	case_sensitive_names (bool val)
	Sets whether column name matching is case sensitive. More...

	operator parquet_reader_options && ()
	move parquet_reader_options member once it's built.

parquet_reader_options &&	build ()
	move parquet_reader_options member once it's built. More...

Detailed Description

Builds parquet_reader_options to use for read_parquet().

Definition at line 550 of file parquet.hpp.

Constructor & Destructor Documentation

◆ parquet_reader_options_builder() [1/2]

cudf::io::parquet_reader_options_builder::parquet_reader_options_builder ( )

default

Default constructor.

This has been added since Cython requires a default constructor to create objects on stack. The hybrid_scan_reader also uses this to construct parquet_reader_options without a source.

◆ parquet_reader_options_builder() [2/2]

cudf::io::parquet_reader_options_builder::parquet_reader_options_builder ( source_info src )

inlineexplicit

Constructor from source info.

Parameters

src	The source information used to read parquet file

Definition at line 567 of file parquet.hpp.

Member Function Documentation

◆ allow_mismatched_pq_schemas()

parquet_reader_options_builder& cudf::io::parquet_reader_options_builder::allow_mismatched_pq_schemas ( bool val )

inline

Sets to enable/disable reading of matching projected and filter columns from mismatched Parquet sources.

Parameters

val	Boolean value whether to read matching projected and filter columns from mismatched Parquet sources.

Returns: this for chaining.

Definition at line 672 of file parquet.hpp.

◆ build()

parquet_reader_options&& cudf::io::parquet_reader_options_builder::build ( )

inline

move parquet_reader_options member once it's built.

This has been added since Cython does not support overloading of conversion operators.

Returns: Built parquet_reader_options object's r-value reference

Definition at line 818 of file parquet.hpp.

◆ case_sensitive_names()

parquet_reader_options_builder& cudf::io::parquet_reader_options_builder::case_sensitive_names ( bool val )

inline

Sets whether column name matching is case sensitive.

Note: When disabled, if there are multiple case-insensitive matches, the first matched column is selected from the Parquet schema.

Parameters

val	Boolean indicating whether to enable case-sensitive matching

Returns: this for chaining

Definition at line 800 of file parquet.hpp.

◆ column_indices()

parquet_reader_options_builder& cudf::io::parquet_reader_options_builder::column_indices ( std::vector< cudf::size_type > col_indices )

inline

Sets the indices of top-level columns to be read from all input sources.

Parameters

col_indices A vector of column indices to attempt to read from each input source.

Returns: this for chaining

Definition at line 601 of file parquet.hpp.

◆ column_names()

parquet_reader_options_builder& cudf::io::parquet_reader_options_builder::column_names ( std::vector< std::string > column_names )

inline

Sets names of the columns to be read.

Parameters

column_names Vector of column names

Returns: this for chaining

Definition at line 589 of file parquet.hpp.

◆ columns()

parquet_reader_options_builder& cudf::io::parquet_reader_options_builder::columns ( std::vector< std::string > column_names )

inline

Sets names of the columns to be read.

Deprecated:: Deprecated in 26.04 and will be removed in 26.06+. Use column_names instead.

Parameters

column_names Vector of column names

Returns: this for chaining

Definition at line 577 of file parquet.hpp.

◆ convert_strings_to_categories()

parquet_reader_options_builder& cudf::io::parquet_reader_options_builder::convert_strings_to_categories ( bool val )

inline

Sets enable/disable conversion of strings to categories.

Parameters

val	Boolean value to enable/disable conversion of string columns to categories

Returns: this for chaining

Definition at line 633 of file parquet.hpp.

◆ decimal_width()

parquet_reader_options_builder& cudf::io::parquet_reader_options_builder::decimal_width ( type_id width )

inline

Sets the decimal width used to cast decimal columns.

Parameters

width The decimal type_id (DECIMAL32, DECIMAL64, or DECIMAL128) to which all decimal columns need to be cast. The scale of each column is preserved from the file.

Returns: this for chaining

Definition at line 773 of file parquet.hpp.

◆ filter()

parquet_reader_options_builder& cudf::io::parquet_reader_options_builder::filter ( ast::expression const & filter )

inline

Sets AST based filter for predicate pushdown.

The filter can utilize cudf::ast::column_name_reference to reference a column by its name, even if it's not necessarily present in the requested projected columns. To refer to output column indices, you can use cudf::ast::column_reference.

For a parquet with columns ["A", "B", "C", ... "X", "Y", "Z"], Example 1: with/without column projection

use_columns({"A", "X", "Z"})

.filter(operation(ast_operator::LESS, column_name_reference{"C"}, literal{100}));

cudf::io::parquet_reader_options_builder::filter

parquet_reader_options_builder & filter(ast::expression const &filter)

Sets AST based filter for predicate pushdown.

Definition: parquet.hpp:621

Column "C" need not be present in output table. Example 2: without column projection

filter(operation(ast_operator::LESS, column_reference{1}, literal{100}));

Here, 1 will refer to column "B" because output will contain all columns in order ["A", ..., "Z"]. Example 3: with column projection

use_columns({"A", "Z", "X"})

.filter(operation(ast_operator::LESS, column_reference{1}, literal{100}));

Here, 1 will refer to column "Z" because output will contain 3 columns in order ["A", "Z", "X"].

Parameters

filter AST expression to use as filter

Returns: this for chaining

Definition at line 621 of file parquet.hpp.

◆ ignore_missing_columns()

parquet_reader_options_builder& cudf::io::parquet_reader_options_builder::ignore_missing_columns ( bool val )

inline

Sets to enable/disable ignoring of non-existent projected columns while reading.

Parameters

val	Boolean indicating whether to ignore non-existent projected columns while reading.

Returns: this for chaining.

Definition at line 685 of file parquet.hpp.

◆ num_bytes()

parquet_reader_options_builder& cudf::io::parquet_reader_options_builder::num_bytes ( size_t val )

inline

Sets number of bytes after skipping to end reading row groups at.

Parameters

val	Number of bytes after skipping to end reading row groups at

Returns: this for chaining

Definition at line 748 of file parquet.hpp.

◆ num_rows()

parquet_reader_options_builder& cudf::io::parquet_reader_options_builder::num_rows ( int64_t val )

inline

Sets number of rows to read.

Note: Although this allows one to request more than size_type::max() rows, if any single read would produce a table larger than this row limit, an error is thrown.

Parameters

val	Number of rows to read after skip

Returns: this for chaining

Definition at line 724 of file parquet.hpp.

◆ row_groups()

parquet_reader_options_builder& cudf::io::parquet_reader_options_builder::row_groups ( std::vector< std::vector< size_type >> row_groups )

inline

Specifies which row groups to read from each input source.

When reading from multiple sources (e.g., multiple files), this function allows selecting specific row groups for each source individually. The outer vector corresponds to the list of input sources, and each inner vector contains the row group indices to read from the respective source.

If no row groups should be read from a given source, its entry should be an empty vector.

Example: To read row groups [0, 2] from the first input and [1] from the second input, call: set_row_groups({{0, 2}, {1}});

Output ordering: rows are emitted in input-source order; all rows selected from source 0 are emitted before rows selected from source 1, and so on. Within each source, row groups appear in the exact order given by the inner vector; the reader does not sort or deduplicate the indices, and repeated indices are emitted multiple times. An empty inner vector means that source contributes no rows but does not affect the order of the remaining sources. When this setter is not called, all row groups are read in source order, then in on-disk order within each source. Row groups removed by standard read_parquet predicate pushdown (statistics or bloom filter pruning) are dropped in place; the remaining row groups keep their relative order.

Parameters

row_groups A vector of vectors, one per input source, each specifying the row group indices to read from that source.

Returns: this for chaining

Definition at line 611 of file parquet.hpp.

◆ set_column_schema()

parquet_reader_options_builder& cudf::io::parquet_reader_options_builder::set_column_schema ( std::vector< reader_column_schema > val )

inline

Sets reader metadata.

Parameters

val	Tree of metadata information.

Returns: this for chaining

Definition at line 697 of file parquet.hpp.

◆ skip_bytes()

parquet_reader_options_builder& cudf::io::parquet_reader_options_builder::skip_bytes ( size_t val )

inline

Sets bytes to skip before starting reading row groups.

Parameters

val	Bytes to skip before starting reading row groups

Returns: this for chaining

Definition at line 736 of file parquet.hpp.

◆ skip_rows()

parquet_reader_options_builder& cudf::io::parquet_reader_options_builder::skip_rows ( int64_t val )

inline

Sets number of rows to skip.

Parameters

val	Number of rows to skip from start

Returns: this for chaining

Definition at line 709 of file parquet.hpp.

◆ timestamp_type()

parquet_reader_options_builder& cudf::io::parquet_reader_options_builder::timestamp_type ( data_type type )

inline

timestamp_type used to cast timestamp columns.

Parameters

type	The timestamp data_type to which all timestamp columns need to be cast

Returns: this for chaining

Definition at line 760 of file parquet.hpp.

◆ use_arrow_schema()

parquet_reader_options_builder& cudf::io::parquet_reader_options_builder::use_arrow_schema ( bool val )

inline

Sets to enable/disable use of arrow schema to read.

Parameters

val	Boolean value whether to use arrow schema

Returns: this for chaining

Definition at line 657 of file parquet.hpp.

◆ use_jit_filter()

parquet_reader_options_builder& cudf::io::parquet_reader_options_builder::use_jit_filter ( bool use_jit_filter )

inline

Enable/disable use of JIT for filter step.

Parameters

use_jit_filter Boolean value whether to use JIT filter

Returns: this for chaining

Definition at line 785 of file parquet.hpp.

◆ use_pandas_metadata()

parquet_reader_options_builder& cudf::io::parquet_reader_options_builder::use_pandas_metadata ( bool val )

inline

Sets to enable/disable use of pandas metadata to read.

Parameters

val	Boolean value whether to use pandas metadata

Returns: this for chaining

Definition at line 645 of file parquet.hpp.

The documentation for this class was generated from the following file:

parquet.hpp

Public Member Functions

Detailed Description

Constructor & Destructor Documentation

◆ parquet_reader_options_builder() [1/2]

◆ parquet_reader_options_builder() [2/2]

Member Function Documentation

◆ allow_mismatched_pq_schemas()

◆ build()

◆ case_sensitive_names()

◆ column_indices()

◆ column_names()

◆ columns()

◆ convert_strings_to_categories()

◆ decimal_width()

◆ filter()

◆ ignore_missing_columns()

◆ num_bytes()

◆ num_rows()

◆ row_groups()

◆ set_column_schema()

◆ skip_bytes()

◆ skip_rows()

◆ timestamp_type()

◆ use_arrow_schema()

◆ use_jit_filter()

◆ use_pandas_metadata()