# CUDA

Setting up a CUDA environment for C++ development can be challenging for beginners. Yet, it is also rewarding as it will allow you to do lots of cool things like customize PyTorch and TensorFlow.

This tutorial provides a step-by-step guide for setting up a CUDA environment on a Linux machine and a Nvidia GPU. Note: The installation process will fail if you don't have a supported compiler. Make sure you have **`gcc`**, **`g++`**, **`ninja`**, and **`cmake`** installed before you install the CUDA driver.&#x20;

### Install CUDA Driver (Requires Sudo)

First, download the driver run file from [CUDA Toolkit Downloads](https://developer.nvidia.com/cuda-downloads?target_os=Linux\&target_arch=x86_64\&Distribution=Ubuntu\&target_version=22.04\&target_type=runfile_local) and install the driver accordingly.&#x20;

You need to have sudo privileges to install the driver.&#x20;

```bash
# Download the installation scripts
wget https://developer.download.nvidia.com/compute/cuda/12.6.2/local_installers/cuda_12.6.2_560.35.03_linux.run

# Install Driver
sudo sh cuda_12.6.2_560.35.03_linux.run --silent --driver

# Optional install driver and toolkit
# sudo sh cuda_12.3.2_545.23.08_linux.run --silent --driver --toolkit

sudo reboot
```

Note: The installation process will fail if you don't have a supported compiler. Make sure you have gcc, g++, ninja, and cmake installed before you install the CUDA driver.&#x20;

### Install CUDA Toolkit

#### Option I: CUDA Toolkit

I will use the CUDA 12.6.2 version as an example here.&#x20;

```bash
# Download the installation scripts
wget https://developer.download.nvidia.com/compute/cuda/12.6.2/local_installers/cuda_12.6.2_560.35.03_linux.run

# Install Driver
CUDA_HOME=~/nvidia/cuda-12.6 # change this to the directory you want to install
sh cuda_12.6.2_560.35.03_linux.run --silent --toolkit --toolkitpath=${CUDA_HOME}
```

#### Setting Up Environment Variables for CUDA Toolkit

```bash
echo "export CUDA_HOME=${CUDA_HOME}" >> ~/.bashrc
echo 'export CPATH="${CUDA_HOME}"/include:"${CPATH}"' >> ~/.bashrc
echo 'export LD_LIBRARY_PATH="${CUDA_HOME}"/lib64:"${LD_LIBRARY_PATH}"' >> ~/.bashrc
echo 'export PATH="${CUDA_HOME}"/bin:"${PATH}"' >> ~/.bashrc
source ~/.bashrc
```

#### Option II: HPC SDK

[NVIDIA HPC SDK](https://developer.nvidia.com/hpc-sdk-downloads) includes (almost) everything you need for CUDA program development. We will use it to install the CUDA Toolkit along with NCCL and cuBLAS etc.&#x20;

I will use the 24.9 version as an example, and you can choose any version you want. You need to use a different URL to download the SDK if your CPU is not from Intel or AMD. Check the [NVIDIA HPC SDK](https://developer.nvidia.com/hpc-sdk-downloads) page for the details.

<pre class="language-bash"><code class="lang-bash"><strong># x86
</strong>wget https://developer.download.nvidia.com/hpc-sdk/24.9/nvhpc_2024_249_Linux_x86_64_cuda_12.6.tar.gz
tar xpzf nvhpc_2024_249_Linux_x86_64_cuda_12.6.tar.gz
nvhpc_2024_249_Linux_x86_64_cuda_12.6/install
</code></pre>

```bash
# ARM
wget https://developer.download.nvidia.com/hpc-sdk/24.9/nvhpc_2024_249_Linux_aarch64_cuda_12.6.tar.gz
tar xpzf nvhpc_2024_249_Linux_aarch64_cuda_12.6.tar.gz
nvhpc_2024_249_Linux_aarch64_cuda_12.6/install
```

#### Setting Up Environment Variables for NVHPC

Assuming you want to use CUDA Toolkit Version **12.6**, the following code snippet contains the environment variables you want to set. You just need to add them to your `~/.bashrc` file and restart your terminal to set the variables correctly.&#x20;

```bash
# # >>> START CUDA ENV (HPC SDK) >>>
HPC_SDK_VERSION=24.9 # Set this to the NVHPC Tooklit version you installed
CUDA_VERSION=12.6 # Set this to the CUDA Tooklit version you installed

ISA=$(uname -i)
PLATFORM=$(uname)_$(uname -i)

export NVHPC=/opt/nvidia/hpc_sdk # SET THIS TO YOUR INSTALLED PATH
export CUDA_COMM_LIBS=${NVHPC}/${PLATFORM}/${HPC_SDK_VERSION}/comm_libs/${CUDA_VERSION}
export CUDA_MATH_LIBS=${NVHPC}/${PLATFORM}/${HPC_SDK_VERSION}/math_libs/${CUDA_VERSION}

# CUDA TOOLKIT
export CUDA_HOME=${NVHPC}/${PLATFORM}/${HPC_SDK_VERSION}/cuda/${CUDA_VERSION}
export NVHPC_ROOT=${CUDA_HOME}
export CUDA_TOOLKIT_ROOT_DIR=${CUDA_HOME}
export CUDA_TOOLKIT_ROOT=${CUDA_HOME}
export CUDA_PATH=${CUDA_HOME}
export PATH=${CUDA_HOME}/bin:${PATH}
export CPATH=${CUDA_HOME}/include:${CPATH}
export LD_LIBRARY_PATH=${CUDA_HOME}/lib64:${LD_LIBRARY_PATH}
export LD_LIBRARY_PATH=${CUDA_HOME}/extras/CUPTI/lib64:${LD_LIBRARY_PATH}

# MATH_LIBS (cuBLAS)
export CPATH=${CUDA_MATH_LIBS}/include:$CPATH
export LD_LIBRARY_PATH=${CUDA_MATH_LIBS}/lib64/:${LD_LIBRARY_PATH}

# NCCL Configuration
export USE_NCCL=1
export USE_SYSTEM_NCCL=1
export NCCL_HOME=${CUDA_COMM_LIBS}/nccl
export NCCL_PREFIX=${NCCL_HOME}
export NCCL_ROOT=${NCCL_HOME}
export NCCL_INCLUDE_DIR=${NCCL_HOME}/include
export NCCL_LIB_DIR=${NCCL_HOME}/lib

# NVShmem Configuration
export NVSHMEM_HOME=${CUDA_COMM_LIBS}/nvshmem
export NVSHMEM_PREFIX=${NVSHMEM_HOME}
export NVSHMEM_ROOT=${NVSHMEM_HOME}
export NVSHMEM_INCLUDE_DIR=${NVSHMEM_HOME}/include
export NVSHMEM_LIB_DIR=${NVSHMEM_HOME}/lib

# Compiler (not recommended, causing compilation error)
# CUDA_COMPILER_DIR=${NVHPC}/${PLATFORM}/${HPC_SDK_VERSION}/compilers/
# export MANPATH=$MANPATH:${NVHPC}/${PLATFORM}/${HPC_SDK_VERSION}/compilers/man
# export PATH=$PATH:$CUDA_COMPILER_DIR/bin
# export PATH=$PATH:$CUDA_COMPILER_DIR/bin/mpi/bin
# export PATH=$PATH:$CUDA_COMPILER_DIR/extras/qd/bin
# export CPATH=$CUDA_COMPILER_DIR/include:$CPATH
# export LD_LIBRARY_PATH=$CUDA_COMPILER_DIR/lib/:${LD_LIBRARY_PATH}
# export CC=$CUDA_COMPILER_DIR/bin/nvc
# export CXX=$CUDA_COMPILER_DIR/bin/nvc++
# export FC=$CUDA_COMPILER_DIR/bin/nvfortran
# export F90=$CUDA_COMPILER_DIR/bin/nvfortran
# export F77=$CUDA_COMPILER_DIR/bin/nvfortran
# export CPP=cpp

echo "Setting NVHPC HOME=$NVHPC"

# # >>> END CUDA ENV (HPC SDK) >>>
```

Some libraries (ex. Caffe2) will look for dependent CUDA libraries inside the `CUDA_TOOLKIT_ROOT` directory. Let's create a symbolic link for these math libraries such that they can be found correctly.  Note that you do not need to run these scripts if your toolkit is installed through CUDA Toolkit script.

<pre class="language-bash"><code class="lang-bash"># cuspare, cublas etc
ln -s ${CUDA_MATH_LIBS}/include/* ${CUDA_TOOLKIT_ROOT}/include/
<strong>ln -s ${CUDA_MATH_LIBS}/lib64/* ${CUDA_TOOLKIT_ROOT}/lib64/
</strong>ln -s ${CUDA_MATH_LIBS}/lib64/stubs/* ${CUDA_TOOLKIT_ROOT}/lib64/stubs/
<strong>
</strong><strong># nccl
</strong><strong># ln -s ${CUDA_COMM_LIBS}/nccl/include/* ${CUDA_TOOLKIT_ROOT}/include/
</strong># ln -s ${CUDA_COMM_LIBS}/nccl/lib/* ${CUDA_TOOLKIT_ROOT}/lib64/

# nvshemem
# ln -s ${CUDA_COMM_LIBS}/nvshmem/include/* ${CUDA_TOOLKIT_ROOT}/include/
# ln -s ${CUDA_COMM_LIBS}/nvshmem/lib/* ${CUDA_TOOLKIT_ROOT}/lib64/
</code></pre>

### CUDNN (Optional)

CUDNN does not come with CUDA Toolkit or HPC SDK, and it needs to be installed separately. You can download the library from this [website](https://developer.nvidia.com/rdp/cudnn-archive).&#x20;

If your GPU is Ampere or Hopper architecture, follow the instructions on this [website ](https://developer.nvidia.com/cudnn-downloads)to install the latest version.

```bash
# x86
cd /tmp # or any download directory you prefer
wget https://developer.download.nvidia.com/compute/cudnn/redist/cudnn/linux-x86_64/cudnn-linux-x86_64-9.4.0.58_cuda12-archive.tar.xz
tar -xvf cudnn-linux-x86_64-9.4.0.58_cuda12-archive.tar.xz

# Copy the file into the CUDA toolkit directory
cp cudnn-*-archive/include/cudnn*.h ${CUDA_HOME}/include 
cp -P cudnn-*-archive/lib/libcudnn* ${CUDA_HOME}/lib64 
chmod a+r ${CUDA_HOME}/include/cudnn*.h ${CUDA_HOME}/lib64/libcudnn*
```

```bash
# Arm64
wget https://developer.download.nvidia.com/compute/cudnn/redist/cudnn/linux-sbsa/cudnn-linux-sbsa-9.4.0.58_cuda12-archive.tar.xz
tar -xvf cudnn-linux-sbsa-9.4.0.58_cuda12-archive.tar.xz

# Copy the file into the CUDA toolkit directory
cp cudnn-*-archive/include/cudnn*.h ${CUDA_HOME}/include 
cp -P cudnn-*-archive/lib/libcudnn* ${CUDA_HOME}/lib64 
chmod a+r ${CUDA_HOME}/include/cudnn*.h ${CUDA_HOME}/lib64/libcudnn*
```


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://notes.microzaizai.com/coding/cuda.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.