cudf.core.groupby.DataFrameGroupBy.head#

DataFrameGroupBy.head(n: int = 5, *, preserve_order: bool = True)[source]#

Return first n rows of each group

Parameters:
n

If positive: number of entries to include from start of group If negative: number of entries to exclude from end of group

preserve_order

If True (default), return the n rows from each group in original dataframe order (this mimics pandas behavior though is more expensive). If you don’t need rows in original dataframe order you will see a performance improvement by setting preserve_order=False. In both cases, the original index is preserved, so .loc-based indexing will work identically.

Returns:
Series or DataFrame

Subset of the original grouped object as determined by n

See also

tail

Examples

>>> import cudf
>>> df = cudf.DataFrame(
...     {
...         "a": [1, 0, 1, 2, 2, 1, 3, 2, 3, 3, 3],
...         "b": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
...     }
... )
>>> df.groupby("a").head(1)
   a  b
0  1  0
1  0  1
3  2  3
6  3  6
>>> df.groupby("a").head(-2)
   a  b
0  1  0
3  2  3
6  3  6
8  3  8