Advanced Topics#
Just-in-Time Compilation#
cuVS uses the Just-in-Time (JIT) Link-Time Optimization (LTO) compilation technology to compile certain kernels. When a JIT compilation is triggered, cuVS will compile the kernel for your architecture and automatically cache it in-memory and on-disk. The validity of the cache is as follows:
In-memory cache is valid for the lifetime of the process.
On-disk cache is valid until a CUDA driver upgrade is performed. The cache can be portably shared between machines in network or cloud storage and we strongly recommend that you store the cache in a persistent location. For more details on how to configure the on-disk cache, look at CUDA documentation on JIT Compilation. Specifically, the environment variables of interest are:
CUDA_CACHE_PATHandCUDA_CACHE_MAX_SIZE.
Thus, the JIT compilation is a one-time cost and you can expect no loss in real performance after the first compilation. We recommend that you run a “warmup” to trigger the JIT compilation before the actual usage.
Currently, the following capabilities will trigger a JIT compilation: - IVF Flat search APIs: cuvs::neighbors::ivf_flat::search()