Advanced Topics#

Just-in-Time Compilation

Just-in-Time Compilation#

cuVS uses the Just-in-Time (JIT) Link-Time Optimization (LTO) compilation technology to compile certain kernels. When a JIT compilation is triggered, cuVS will compile the kernel for your architecture and automatically cache it in-memory and on-disk. The validity of the cache is as follows:

In-memory cache is valid for the lifetime of the process.
On-disk cache is valid until a CUDA driver upgrade is performed. The cache can be portably shared between machines in network or cloud storage and we strongly recommend that you store the cache in a persistent location. For more details on how to configure the on-disk cache, look at CUDA documentation on JIT Compilation. Specifically, the environment variables of interest are: CUDA_CACHE_PATH and CUDA_CACHE_MAX_SIZE.

Thus, the JIT compilation is a one-time cost and you can expect no loss in real performance after the first compilation. We recommend that you run a “warmup” to trigger the JIT compilation before the actual usage.

Currently, the following capabilities will trigger a JIT compilation: - IVF Flat search APIs: cuvs::neighbors::ivf_flat::search()

JIT LTO (Just-In-Time Link-Time Optimization) Guide