Column#
- class pylibcudf.column.Column(obj=None, *args, **kwargs)#
A container of nullable device data as a column of elements.
This class is an implementation of Arrow columnar data specification for data stored on GPUs. It relies on Python memoryview-like semantics to maintain shared ownership of the data it is constructed with, so any input data may also be co-owned by other data structures. The Column is designed to be operated on using algorithms backed by libcudf.
- Parameters:
- data_typeDataType
The type of data in the column.
- sizesize_type
The number of rows in the column.
- datagpumemoryview
The data the column will refer to.
- maskgpumemoryview
The null mask for the column.
- null_countint
The number of null rows in the column.
- offsetint
The offset into the data buffer where the column’s data begins.
- childrenlist
The children of this column if it is a compound column type.
Methods
all_null_like
(Column like, size_type size)Create an all null column from a template.
child
(self, size_type index)Get a child column of this column.
children
(self)The children of the column.
copy
(self)Create a copy of the column.
data
(self)The data buffer of the column.
device_buffer_size
(self)The total size of the device buffers used by the Column.
from_array
(cls, obj)Create a Column from any object which supports the NumPy or CUDA array interface.
from_array_interface
(cls, obj)Create a Column from an object implementing the NumPy Array Interface.
from_cuda_array_interface
(cls, obj)Create a Column from an object implementing the CUDA Array Interface.
from_iterable_of_py
(obj[, dtype])Create a Column from a Python iterable of scalar values or nested iterables.
from_rmm_buffer
(DeviceBuffer buff, ...)Create a Column from an RMM DeviceBuffer.
from_scalar
(Scalar slr, size_type size)Create a Column from a Scalar.
list_view
(self)Accessor for methods of a Column that are specific to lists.
null_count
(self)The number of null elements in the column.
null_mask
(self)The null mask of the column.
num_children
(self)The number of children of this column.
offset
(self)The offset of the column.
size
(self)The number of elements in the column.
to_scalar
(self)Return the first value of 1-element column as a Scalar.
type
(self)The type of data in the column.
with_mask
(self, gpumemoryview mask, ...)Augment this column with a new null mask.
- static all_null_like(Column like, size_type size)#
Create an all null column from a template.
- Parameters:
- likeColumn
Column whose type we should mimic
- sizeint
Number of rows in the resulting column.
- Returns:
- Column
An all-null column of size rows and type matching like.
- child(self, size_type index) Column #
Get a child column of this column.
- Parameters:
- indexsize_type
The index of the child column to get.
- Returns:
- Column
The child column.
- data(self) gpumemoryview #
The data buffer of the column.
- device_buffer_size(self) uint64_t #
The total size of the device buffers used by the Column.
- Returns:
- Number of bytes.
Notes
Since Columns rely on Python memoryview-like semantics to maintain shared ownership of the data, the device buffers underlying this column might be shared between other data structures including other columns.
- classmethod from_array(cls, obj)#
Create a Column from any object which supports the NumPy or CUDA array interface.
- Parameters:
- objobject
The input array to be converted into a pylibcudf.Column.
- Returns:
- Column
- Raises:
- TypeError
If the input does not implement a supported array interface.
Notes
Only C-contiguous host and device ndarrays are supported. For device arrays, the data is not copied.
Examples
>>> import pylibcudf as plc >>> import cupy as cp >>> cp_arr = cp.array([[1,2],[3,4]]) >>> col = plc.Column.from_array(cp_arr)
- classmethod from_array_interface(cls, obj)#
Create a Column from an object implementing the NumPy Array Interface.
If the object provides a raw memory pointer via the “data” field, we use that pointer directly and avoid copying. Otherwise, a ValueError is raised.
- Parameters:
- objAny
Must implement the
__array_interface__
protocol.
- Returns:
- Column
A Column containing the data from the array interface.
- Raises:
- TypeError
If the object does not implement
__array_interface__
.- ValueError
If the array is not 1D or 2D, or is not C-contiguous. If the number of rows exceeds size_type limit. If the ‘data’ field is invalid.
- NotImplementedError
If the object has a mask.
- classmethod from_cuda_array_interface(cls, obj)#
Create a Column from an object implementing the CUDA Array Interface.
- Parameters:
- objAny
Must implement the
__cuda_array_interface__
protocol.
- Returns:
- Column
A Column containing the data from the CUDA array interface.
- Raises:
- TypeError
If the object does not support
__cuda_array_interface__
.- ValueError
If the object is not 1D or 2D, or is not C-contiguous. If the number of rows exceeds size_type limit.
- NotImplementedError
If the object has a mask.
- static from_iterable_of_py(obj: Iterable, dtype: DataType | None = None) Column #
Create a Column from a Python iterable of scalar values or nested iterables.
- Parameters:
- objIterable
An iterable of scalar values (e.g., int, float, bool) or a nested iterable.
- dtypeDataType | None
The data type of the elements. If not specified, the type is inferred.
- Returns:
- Column
A Column containing the data from the input iterable.
- Raises:
- TypeError
If the input contains unsupported scalar types.
- ValueError
If the iterable is empty and dtype is not provided.
Notes
Only scalar types int, float, and bool are supported.
Nested iterables must be materialized as lists.
- static from_rmm_buffer(DeviceBuffer buff, DataType dtype, size_type size, list children)#
Create a Column from an RMM DeviceBuffer.
- Parameters:
- buffDeviceBuffer
The data rmm.DeviceBuffer.
- sizesize_type
The number of rows in the column.
- dtypeDataType
The type of the data in the buffer.
- childrenlist
List of child columns.
Notes
To provide a mask and null count, use Column.with_mask after this method.
- static from_scalar(Scalar slr, size_type size)#
Create a Column from a Scalar.
- Parameters:
- slrScalar
The scalar to create a column from.
- sizesize_type
The number of elements in the column.
- Returns:
- Column
A Column containing the scalar repeated size times.
- list_view(self) ListColumnView #
Accessor for methods of a Column that are specific to lists.
- null_count(self) size_type #
The number of null elements in the column.
- null_mask(self) gpumemoryview #
The null mask of the column.
- num_children(self) size_type #
The number of children of this column.
- offset(self) size_type #
The offset of the column.
- size(self) size_type #
The number of elements in the column.
- to_scalar(self) Scalar #
Return the first value of 1-element column as a Scalar.
- Returns:
- Scalar
A Scalar representing the only value in the column, including nulls.
- Raises:
- ValueError
If the column has more than one row.
- with_mask(self, gpumemoryview mask, size_type null_count) Column #
Augment this column with a new null mask.
- Parameters:
- maskgpumemoryview
New mask (or None to unset the mask)
- null_countint
New null count. If this is incorrect, bad things happen.
- Returns:
- New Column object sharing data with self (except for the mask which is new).
- class pylibcudf.column.ListColumnView(Column col)#
Accessor for methods of a Column that are specific to lists.
Methods
child
(self)The data column of the underlying list column.
offsets
(self)The offsets column of the underlying list column.
- child(self)#
The data column of the underlying list column.
- offsets(self)#
The offsets column of the underlying list column.
- pylibcudf.column.is_c_contiguous(shape: Sequence[int], strides: None | Sequence[int], int itemsize: int) bool #
Determine if shape and strides are C-contiguous
- Parameters:
- shapeSequence[int]
Number of elements in each dimension.
- stridesNone | Sequence[int]
The stride of each dimension in bytes. If None, the memory layout is C-contiguous.
- itemsizeint
Size of an element in bytes.
- Returns:
- bool
The boolean answer.