Builds parquet_reader_options to use for read_parquet().
More...
#include <parquet.hpp>
Public Member Functions | |
| parquet_reader_options_builder ()=default | |
| Default constructor. More... | |
| parquet_reader_options_builder (source_info src) | |
| Constructor from source info. More... | |
| parquet_reader_options_builder & | columns (std::vector< std::string > column_names) |
| Sets names of the columns to be read. More... | |
| parquet_reader_options_builder & | column_names (std::vector< std::string > column_names) |
| Sets names of the columns to be read. More... | |
| parquet_reader_options_builder & | column_indices (std::vector< cudf::size_type > col_indices) |
| Sets the indices of top-level columns to be read from all input sources. More... | |
| parquet_reader_options_builder & | row_groups (std::vector< std::vector< size_type >> row_groups) |
| Specifies which row groups to read from each input source. More... | |
| parquet_reader_options_builder & | filter (ast::expression const &filter) |
| Sets AST based filter for predicate pushdown. More... | |
| parquet_reader_options_builder & | convert_strings_to_categories (bool val) |
| Sets enable/disable conversion of strings to categories. More... | |
| parquet_reader_options_builder & | use_pandas_metadata (bool val) |
| Sets to enable/disable use of pandas metadata to read. More... | |
| parquet_reader_options_builder & | use_arrow_schema (bool val) |
| Sets to enable/disable use of arrow schema to read. More... | |
| parquet_reader_options_builder & | allow_mismatched_pq_schemas (bool val) |
| Sets to enable/disable reading of matching projected and filter columns from mismatched Parquet sources. More... | |
| parquet_reader_options_builder & | ignore_missing_columns (bool val) |
| Sets to enable/disable ignoring of non-existent projected columns while reading. More... | |
| parquet_reader_options_builder & | set_column_schema (std::vector< reader_column_schema > val) |
| Sets reader metadata. More... | |
| parquet_reader_options_builder & | skip_rows (int64_t val) |
| Sets number of rows to skip. More... | |
| parquet_reader_options_builder & | num_rows (int64_t val) |
| Sets number of rows to read. More... | |
| parquet_reader_options_builder & | skip_bytes (size_t val) |
| Sets bytes to skip before starting reading row groups. More... | |
| parquet_reader_options_builder & | num_bytes (size_t val) |
| Sets number of bytes after skipping to end reading row groups at. More... | |
| parquet_reader_options_builder & | timestamp_type (data_type type) |
| timestamp_type used to cast timestamp columns. More... | |
| parquet_reader_options_builder & | decimal_width (type_id width) |
| Sets the decimal width used to cast decimal columns. More... | |
| parquet_reader_options_builder & | use_jit_filter (bool use_jit_filter) |
| Enable/disable use of JIT for filter step. More... | |
| parquet_reader_options_builder & | case_sensitive_names (bool val) |
| Sets whether column name matching is case sensitive. More... | |
| operator parquet_reader_options && () | |
| move parquet_reader_options member once it's built. | |
| parquet_reader_options && | build () |
| move parquet_reader_options member once it's built. More... | |
Builds parquet_reader_options to use for read_parquet().
Definition at line 550 of file parquet.hpp.
|
default |
Default constructor.
This has been added since Cython requires a default constructor to create objects on stack. The hybrid_scan_reader also uses this to construct parquet_reader_options without a source.
|
inlineexplicit |
Constructor from source info.
| src | The source information used to read parquet file |
Definition at line 567 of file parquet.hpp.
|
inline |
Sets to enable/disable reading of matching projected and filter columns from mismatched Parquet sources.
| val | Boolean value whether to read matching projected and filter columns from mismatched Parquet sources. |
Definition at line 672 of file parquet.hpp.
|
inline |
move parquet_reader_options member once it's built.
This has been added since Cython does not support overloading of conversion operators.
parquet_reader_options object's r-value reference Definition at line 818 of file parquet.hpp.
|
inline |
Sets whether column name matching is case sensitive.
| val | Boolean indicating whether to enable case-sensitive matching |
Definition at line 800 of file parquet.hpp.
|
inline |
Sets the indices of top-level columns to be read from all input sources.
| col_indices | A vector of column indices to attempt to read from each input source. |
Definition at line 601 of file parquet.hpp.
|
inline |
Sets names of the columns to be read.
| column_names | Vector of column names |
Definition at line 589 of file parquet.hpp.
|
inline |
Sets names of the columns to be read.
column_names instead.| column_names | Vector of column names |
Definition at line 577 of file parquet.hpp.
|
inline |
Sets enable/disable conversion of strings to categories.
| val | Boolean value to enable/disable conversion of string columns to categories |
Definition at line 633 of file parquet.hpp.
|
inline |
Sets the decimal width used to cast decimal columns.
| width | The decimal type_id (DECIMAL32, DECIMAL64, or DECIMAL128) to which all decimal columns need to be cast. The scale of each column is preserved from the file. |
Definition at line 773 of file parquet.hpp.
|
inline |
Sets AST based filter for predicate pushdown.
The filter can utilize cudf::ast::column_name_reference to reference a column by its name, even if it's not necessarily present in the requested projected columns. To refer to output column indices, you can use cudf::ast::column_reference.
For a parquet with columns ["A", "B", "C", ... "X", "Y", "Z"], Example 1: with/without column projection
Column "C" need not be present in output table. Example 2: without column projection
Here, 1 will refer to column "B" because output will contain all columns in order ["A", ..., "Z"]. Example 3: with column projection
Here, 1 will refer to column "Z" because output will contain 3 columns in order ["A", "Z", "X"].
| filter | AST expression to use as filter |
Definition at line 621 of file parquet.hpp.
|
inline |
Sets to enable/disable ignoring of non-existent projected columns while reading.
| val | Boolean indicating whether to ignore non-existent projected columns while reading. |
Definition at line 685 of file parquet.hpp.
|
inline |
Sets number of bytes after skipping to end reading row groups at.
| val | Number of bytes after skipping to end reading row groups at |
Definition at line 748 of file parquet.hpp.
|
inline |
Sets number of rows to read.
size_type::max() rows, if any single read would produce a table larger than this row limit, an error is thrown.| val | Number of rows to read after skip |
Definition at line 724 of file parquet.hpp.
|
inline |
Specifies which row groups to read from each input source.
When reading from multiple sources (e.g., multiple files), this function allows selecting specific row groups for each source individually. The outer vector corresponds to the list of input sources, and each inner vector contains the row group indices to read from the respective source.
If no row groups should be read from a given source, its entry should be an empty vector.
Example: To read row groups [0, 2] from the first input and [1] from the second input, call: set_row_groups({{0, 2}, {1}});
Output ordering: rows are emitted in input-source order; all rows selected from source 0 are emitted before rows selected from source 1, and so on. Within each source, row groups appear in the exact order given by the inner vector; the reader does not sort or deduplicate the indices, and repeated indices are emitted multiple times. An empty inner vector means that source contributes no rows but does not affect the order of the remaining sources. When this setter is not called, all row groups are read in source order, then in on-disk order within each source. Row groups removed by standard read_parquet predicate pushdown (statistics or bloom filter pruning) are dropped in place; the remaining row groups keep their relative order.
| row_groups | A vector of vectors, one per input source, each specifying the row group indices to read from that source. |
Definition at line 611 of file parquet.hpp.
|
inline |
Sets reader metadata.
| val | Tree of metadata information. |
Definition at line 697 of file parquet.hpp.
|
inline |
Sets bytes to skip before starting reading row groups.
| val | Bytes to skip before starting reading row groups |
Definition at line 736 of file parquet.hpp.
|
inline |
Sets number of rows to skip.
| val | Number of rows to skip from start |
Definition at line 709 of file parquet.hpp.
|
inline |
timestamp_type used to cast timestamp columns.
| type | The timestamp data_type to which all timestamp columns need to be cast |
Definition at line 760 of file parquet.hpp.
|
inline |
Sets to enable/disable use of arrow schema to read.
| val | Boolean value whether to use arrow schema |
Definition at line 657 of file parquet.hpp.
|
inline |
Enable/disable use of JIT for filter step.
| use_jit_filter | Boolean value whether to use JIT filter |
Definition at line 785 of file parquet.hpp.
|
inline |
Sets to enable/disable use of pandas metadata to read.
| val | Boolean value whether to use pandas metadata |
Definition at line 645 of file parquet.hpp.