环境配置#
以下是不同操作系统和开发环境下的配置方法。笔者的开发环境为:
- OS: Windows 11 24H2
- IDE/Editor: Visual Studio Community 2022 17.14.15 (September 2025), VSCode 1.104.1
- Compiler: MSVC 19.44.35217
- CMake: 4.1.1
- CUDA Toolkit: 13.0.1
- Clangd: 21.1.1
- Docker: Docker version 28.4.0, build d8eb465
以及硬件配置:
- 显卡: NVIDIA GeForce RTX 5060 Ti 16GB (驱动版本 581.29,CUDA 13.0 兼容,计算能力 12.0)
Windows + Visual Studio#
在 Windows 上,你先应该有一个 C++ 编译器,推荐安装 Visual Studio Community 2022,安装时选择“使用 C++ 的桌面开发”工作负载。然后,安装 CUDA Toolkit,CUDA 版本应该与你的 GPU 驱动兼容。安装完成后,在终端中执行 nvcc --version 来验证 CUDA 是否正确安装。
接下来,打开 Visual Studio,选择包含“CUDA”字样的项目模板(我这里是 “CUDA 13.0 Runtime”),创建一个新项目。这样会生成一个配置好的 CUDA 项目,你可以看见 kernel.cu 文件。你可以在这个文件中编写 CUDA 代码,并使用 Visual Studio 的构建功能(MS Build)来编译和运行你的程序。
如果你想在现有项目中添加 CUDA 支持,那这样会麻烦不少。参考NVIDIA官方文档来手动配置项目属性。
Windows + CMake + VSCode + IntelliSense#
在 Windows 上使用 CMake 配置 CUDA 环境相对简单。首先,确保你已经安装了 CMake 和 CUDA Toolkit。为了启用 CUDA 支持,你需要在 CMakeLists.txt 中将 LANGUAGES 参数设置为 CUDA。
cmake_minimum_required(VERSION 3.24)set(CMAKE_CUDA_ARCHITECTURES native)project(hello_cuda LANGUAGES CXX CUDA)add_executable(hello_cuda main.cpp kernel.cu)在add_executable中,main.cpp 是你的主程序文件,kernel.cu 是你的 CUDA 文件。然后,在终端中运行以下命令来生成构建文件:
mkdir buildcd buildcmake ..cmake --build .这将创建一个 hello_cuda.exe 可执行文件,你可以运行它来测试你的 CUDA 代码。
- 注意
CUDA_ARCHITECTURES设置为native需要 CMake 3.24 或更高版本。如果你使用的是较旧版本的 CMake,你需要手动设置 CUDA 架构,例如set(CMAKE_CUDA_ARCHITECTURES 61),具体值取决于你的 GPU 计算能力。 - 你可能在一些教程中见到
find_package(CUDA REQUIRED),但这是较旧的用法,不推荐使用。
在 Visual Studio Code 中,你可以使用 CMake Tools 插件来管理你的 CMake 项目。确保你已经安装了 CMake Tools 插件,然后打开包含 CMakeLists.txt 的文件夹。插件会自动检测并配置项目,你可以使用命令面板中的 CMake 命令来生成和构建项目。
生成项目的日志会显示在输出面板中,你可以查看编译器调用和任何错误信息。如果一切顺利,你应该能够看到类似于以下的输出:
[main] Configuring project: Tensor[proc] Executing command: "C:\Program Files\CMake\bin\cmake.EXE" -DCMAKE_EXPORT_COMPILE_COMMANDS:BOOL=TRUE --no-warn-unused-cli -S C:/Users/lvhao/Projects/ai-programming/hw2/Tensor -B c:/Users/lvhao/Projects/ai-programming/hw2/Tensor/build -G "Visual Studio 17 2022" -T host=x64 -A x64[cmake] Not searching for unused variables given on the command line.[cmake] -- The CXX compiler identification is MSVC 19.44.35217.0[cmake] -- The CUDA compiler identification is NVIDIA 13.0.48 with host compiler MSVC 19.44.35217.0[cmake] -- Detecting CXX compiler ABI info[cmake] -- Detecting CXX compiler ABI info - done[cmake] -- Check for working CXX compiler: C:/Program Files/Microsoft Visual Studio/2022/Community/VC/Tools/MSVC/14.44.35207/bin/Hostx64/x64/cl.exe - skipped[cmake] -- Detecting CXX compile features[cmake] -- Detecting CXX compile features - done[cmake] -- Detecting CUDA compiler ABI info[cmake] -- Detecting CUDA compiler ABI info - done[cmake] -- Check for working CUDA compiler: C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v13.0/bin/nvcc.exe - skipped[cmake] -- Detecting CUDA compile features[cmake] -- Detecting CUDA compile features - done[cmake] -- Configuring done (10.0s)[cmake] -- Generating done (0.0s)[cmake] -- Build files have been written to: C:/Users/lvhao/Projects/ai-programming/hw2/Tensor/build通过安装 NVIDIA Nsight Visual Studio Code Edition 插件,你可以获得更好的 CUDA 开发体验,包括:
- Declarative language configuration for CUDA syntax highlighting, bracket matching, code folding, auto-indention, etc.
- C++ language server extensions to support CUDA-specific language features.
- Debugger adapter to provide CUDA debugging.
- Debugger views to provide CUDA-specific debugging information.
- IDE extensions to add productivity enhancements to the VS Code environment.
Windows + WSL2#
参见 CUDA on WSL User Guide — CUDA on WSL 13.0 documentation
Windows + WSL2 + Docker Desktop + VSCode + Dev Containers#
这是一个很方便的选择,你既不需要在 WSL2 中安装 CUDA Toolkit,也不需要在 WSL2 中安装 Docker CE 和 NVIDIA Container Toolkit。你只需要在 Windows 上安装 Docker Desktop,并启用 WSL 2 集成。
按照WSL 上的 Docker 容器入门安装 Docker Desktop,并启用 WSL 2 集成。
安装完成后,在 WSL 终端中执行一个CUDA示例来验证 Docker 是否正确安装并且可以访问 GPU。
docker run --rm -it --gpus=all nvcr.io/nvidia/k8s/cuda-sample:nbody nbody -gpu -benchmark结果应该类似于:
Run "nbody -benchmark [-numbodies=<numBodies>]" to measure performance. -fullscreen (run n-body simulation in fullscreen mode) -fp64 (use double precision floating point values for simulation) -hostmem (stores simulation data in host memory) -benchmark (run benchmark to measure performance) -numbodies=<N> (number of bodies (>= 1) to run in simulation) -device=<d> (where d=0,1,2.... for the CUDA device to use) -numdevices=<i> (where i=(number of CUDA devices > 0) to use for simulation) -compare (compares simulation results running once on the default GPU and once on the CPU) -cpu (run n-body simulation on the CPU) -tipsy=<file.bin> (load a tipsy model file for simulation)
NOTE: The CUDA Samples are not meant for performance measurements. Results may vary when GPU Boost is enabled.
> Windowed mode> Simulation data stored in video memory> Single precision floating point simulation> 1 Devices used for simulationMapSMtoCores for SM 12.0 is undefined. Default to use 128 Cores/SMMapSMtoArchName for SM 12.0 is undefined. Default to use AmpereGPU Device 0: "Ampere" with compute capability 12.0
> Compute 12.0 CUDA device: [NVIDIA GeForce RTX 5060 Ti]36864 bodies, total time for 10 iterations: 19.088 ms= 711.939 billion interactions per second= 14238.788 single-precision GFLOP/s at 20 flops per interaction为了完成 CUDA 开发,我们需要使用 nvidia/cuda 镜像中 tag 带有 devel 的版本。这里根据自己需要选择带或不带 cudnn 的版本(带 cuDNN 的版本通常大几百 MB)。我这里选择 nvidia/cuda:13.0.1-cudnn-devel-ubuntu24.04 。可以在 NVIDIA NGC 查询所有镜像和 tag。
Three flavors of images are provided:
base: Includes the CUDA runtime (cudart)runtime: Builds on thebaseand includes the CUDA math libraries, and NCCL. Aruntimeimage that also includes cuDNN is available. Some images may also include TensorRT.devel: Builds on theruntimeand includes headers, development tools for building CUDA images. These images are particularly useful for multi-stage builds.
在终端中执行以下命令来拉取这个镜像:
docker pull nvidia/cuda:13.0.1-cudnn-devel-ubuntu24.04或者,
docker pull nvcr.io/nvidia/cuda:13.0.1-cudnn-runtime-ubuntu24.04然后,在项目目录下创建.devcontainer文件夹,并在其中创建三个文件:
点击展开
// modified from https://github.com/mirzaim/cuda-devcontainer/blob/main/.devcontainer/devcontainer.json{ "name": "cuda-dev", "build": { "dockerfile": "Dockerfile" },
"customizations": { "vscode": { "extensions": [ "ms-vscode.cpptools", "ms-vscode.cmake-tools", "nvidia.nsight-vscode-edition" ] } },
"runArgs": [ "--gpus=all", "--cap-add=SYS_PTRACE", "--security-opt", "seccomp=unconfined" ],
"containerEnv": { "TZ": "Asia/Shanghai" },
"hostRequirements": { "gpu": true },
"workspaceMount": "source=${localWorkspaceFolder},target=/workspaces/${localWorkspaceFolderBasename},type=bind", "workspaceFolder": "/workspaces/${localWorkspaceFolderBasename}"}FROM nvidia/cuda:13.0.1-cudnn-devel-ubuntu24.04
ARG CMAKE_VERSION="4.1.1"ENV TZ=Asia/Shanghai
COPY ./reinstall-cmake.sh /tmp/reinstall-cmake.sh
# 基础工具 + 时区RUN DEBIAN_FRONTEND=noninteractive apt-get update \ && apt-get install -y --no-install-recommends \ wget curl git ninja-build tzdata ca-certificates gnupg \ && ln -snf /usr/share/zoneinfo/$TZ /etc/localtime \ && echo $TZ > /etc/timezone \ && rm -rf /var/lib/apt/lists/*
# 重装 CMake (可选)RUN if [ "${CMAKE_VERSION}" != "none" ]; then \ chmod +x /tmp/reinstall-cmake.sh && /tmp/reinstall-cmake.sh "${CMAKE_VERSION}"; \ fi \ && rm -f /tmp/reinstall-cmake.sh#!/usr/bin/env bashset -e
CMAKE_VERSION=${1:-"none"}if [ "${CMAKE_VERSION}" = "none" ]; then echo "No CMake version specified, skipping CMake reinstallation" exit 0fi
cleanup() { EXIT_CODE=$?; set +e; [[ -n "${TMP_DIR}" ]] && rm -rf "${TMP_DIR}"; exit $EXIT_CODE; }trap cleanup EXIT
echo "Installing CMake ${CMAKE_VERSION} ..."apt-get -y purge --auto-remove cmake || truemkdir -p /opt/cmake
arch=$(dpkg --print-architecture)case "$arch" in amd64) ARCH=x86_64 ;; arm64) ARCH=aarch64 ;; *) echo "Unsupported architecture: $arch"; exit 1 ;;esac
TMP_DIR=$(mktemp -d -t cmake-XXXXXXXXXX)cd "$TMP_DIR"
BIN="cmake-${CMAKE_VERSION}-linux-${ARCH}.sh"SUM="cmake-${CMAKE_VERSION}-SHA-256.txt"
curl -sSL "https://github.com/Kitware/CMake/releases/download/v${CMAKE_VERSION}/${BIN}" -Ocurl -sSL "https://github.com/Kitware/CMake/releases/download/v${CMAKE_VERSION}/${SUM}" -Osha256sum -c --ignore-missing "$SUM"
sh "$BIN" --prefix=/opt/cmake --skip-license
ln -sf /opt/cmake/bin/cmake /usr/local/bin/cmakeln -sf /opt/cmake/bin/ctest /usr/local/bin/ctestln -sf /opt/cmake/bin/cpack /usr/local/bin/cpackln -sf /opt/cmake/bin/ccmake /usr/local/bin/ccmake在 Linux 下,可能会遇到挂载文件权限的问题。参考Add a non-root user to a container来解决。
在 VSCode 中安装 Remote - Containers 插件。然后,打开你的项目文件夹,按下 F1,选择 Dev Containers: Open Folder in Container...。VSCode 会自动检测配置、构建镜像并启动容器。这样就可以在容器中享用配置好的开发环境,所有的代码都在容器和主机之间共享。
参见: