C++#
RapidsMPF exposes a full C++ API for building high-performance distributed GPU workloads without a Python runtime. The C++ layer is the foundation on which the Python bindings are built.
The C++ API reference is available at docs.rapids.ai/api/librapidsmpf/nightly.
Coverage#
The C++ API provides access to all core RapidsMPF subsystems:
Communicator — MPI and UCXX backends for inter-process communication.
Shuffler — Out-of-core, distributed payload shuffle service.
Streaming Engine — Asynchronous multi-GPU pipeline with Channels, Actors, and Messages.
Memory — BufferResource, spilling, pinned memory, and packed data utilities.
Config — Configuration options and environment-variable parsing.
Shuffle Service#
See Shuffle Architecture for an in-depth explanation of the shuffle design.
rrun — Distributed Launcher#
RapidsMPF includes rrun, a lightweight launcher that eliminates the MPI dependency
for multi-GPU workloads. See Streaming execution for more on the
programming model.
Build rrun#
cd cpp/build
cmake --build . --target rrun
Single-Node Launch#
# Launch 2 ranks on the local node
./tools/rrun -n 2 ./benchmarks/bench_comm -C ucxx -O all-to-all
# With verbose output and specific GPUs
./tools/rrun -v -n 4 -g 0,1,2,3 ./benchmarks/bench_comm -C ucxx