CUDA by Example: Chapter 01-05

发表于： 2024-02-06 更新于： 2024-04-08 分类于： cuda-by-example

源码可以参考 https://github.com/yottaawesome/cuda-by-example/ ，官网的源码链接挂了。

书中的代码有些需要用 opengl 来跑。安装了 freeglut3-dev 和 mesa-utils。（不确定 libgl1-mesa-dev 是否是必要的。）然后 cmake 规则中要 link 对应的库：

cmake_minimum_required(VERSION 3.20.1)
project(chapter3 LANGUAGES CXX CUDA)
set(CMAKE_CUDA_STANDARD 17)
set(CMAKE_CUDA_STANDARD_REQUIRED ON)
set(CMAKE_CUDA_EXTENSIONS OFF)
add_executable(ray ray_global.cu)
target_link_libraries(ray GL glut)
#                         ^^^^^^^

第 3 章 CUDA 源文件

用 nvcc 编译时不需要为 cuda 内置函数额外包含头文件。这些头文件是在 host 端才需要的。

https://stackoverflow.com/questions/6302695/difference-between-cuda-h-cuda-runtime-h-cuda-runtime-api-h

CUDA by Example: Chapter 06-08

发表于： 2024-02-06 更新于： 2024-04-08 分类于： cuda-by-example

第 6 章 Constant Memory and Events

常量内存

常量内存是在全局区域声明的。如果漏掉了 __constant__ 关键字，就会将其定义在全局内存区域，尽管存储方式、分配的时机和用 cudaMalloc 申请的内存有一些差异。

__constant__ Sphere s[SPHERES];

常量内存的内存拷贝方法比较特殊：

HANDLE_ERROR( cudaMemcpyToSymbol( s, temp_s,
                                  sizeof(Sphere) * SPHERES) );

CUDA 线程对常量内存是只读的，也就是只有 host 能操作常量内存。通过将反复读取的数据移动到常量内存区域而不是全局内存，可以加速。但是要注意常量内存的大小非常有限（）。书中的例子只是对 20 个球体做光线追踪。

find

发表于： 2024-02-05

find 的 -exec 选项中，以 ; 结尾（注意 shell 转义）是对每个文件单独运行命令。而 + 结尾是对所有文件用 xargs 的形式运行命令。可以从以下例子看出：

$ find . -name '*.cu' -exec echo {} +
./basic_interop.cu ./ripple.cu ./heat.cu
$ find . -name '*.cu' -exec echo {} \;
./basic_interop.cu
./ripple.cu
./heat.cu

-depth 选项让 find 以深度优先的顺序访问文件，这样文件夹就一定比其包含的子文件后访问，这对于删除等工作非常重要。

find 有一些 global options，要在正确位置使用，不清楚可以 man find 查一下。比如 -mindepth、-maxdepth 和 -depth 都是 global options。

To prevent confusion, global options should specified on the command-line after the list of start points, just before the first test, positional option or action.

bash 行编辑

发表于： 2024-02-02 更新于： 2024-08-18

默认的是 emacs 模式的行编辑。

ctrl _          yank
ctrl y          yank
ctrl /          undo
alt  f          word-level forward
alt  b          word-level backward
alt  d          word-level delete      这个可能会更好用，因为 d 按键更近
alt  backspace  word-level backspace

CUDA by Example: Chapter 09-12

发表于： 2024-02-02 更新于： 2025-04-26 分类于： cuda-by-example

第 9 章原子操作

You should know that atomic operations on global memory are supported only on GPUs of compute capability 1.1 or higher. Furthermore, atomic operations on shared memory require a GPU of compute capability 1.2 or higher.

指定计算能力：

nvcc -arch=sm_11

这样就指定了计算能力是 1.1，当有些指令是只有 1.1 才能编译时加这个参数可以确保编译。同时，有了更加精确的生成目标，nvcc 可以执行一些和硬件相关的优化手段，这些优化手段在更早的架构上可能没有。

但是一个硬件上不一定支持给定的计算能力。通过 -arch-ls 可以列出设备支持的计算能力：

git-blame-ignore-revs

发表于： 2024-02-02 更新于： 2024-08-18

LLVM 项目中有个文件：.git-blame-ignore-revs

说明是：

# Since version 2.23 (released in August 2019), git-blame has a feature
# to ignore or bypass certain commits.
#
# This file contains a list of commits that are not likely what you
# are looking for in a blame, such as mass reformatting or renaming.
# You can set this file as a default ignore file for blame by running
# the following command.
#
# $ git config blame.ignoreRevsFile .git-blame-ignore-revs

这个文件可以指定要在 blame 时忽略的那些提交（比如对工程整体的格式化）。因为 git 配置是本地的，所以需要手动跑一次。

VS Code 错误：Remote Extension host terminated unexpectedly 3 times within the last 5 minutes.

发表于： 2024-02-02

可能是因为上次对 wsl2 vdisk compact 之后造成的？还是因为升级之后造成的？

尝试过以下方法：

删除 ~/.vscode 和 ~/.vscode-server 里面的所有东西然后重新下载。不行。
禁用所有扩展，不行。

发现在 vscode 的终端使用 code 打开文件夹就有问题，但是在 code 之外的终端用 code 命令就没有问题。一看在 vscode 终端中的 code 命令竟然还是 windows 文件系统中的 code 命令，并不是 vscode-server 提供的 code 命令。

后来发现是我在 ~/.bashrc 中提供了 CLEAN_PATH（一个我预设的包含了大多数重要路径的 PATH），本来是想要解决多次载入 ~/.bashrc 因此 PATH 被反复追加的问题，但是这也让 vscode-server 设置好的 PATH 丢失了。同时我还在 PATH 的前面添加了 windows 文件系统的 code 命令，而不是在后面追加，这也是有问题的地方。

chapter01 - basic

发表于： 2024-02-01 更新于： 2024-05-16 分类于： modern-cmake-for-cpp

三个 stages：配置、生成、构建。

配置会生成 CMakeCache。生成是用它去生成 build tree 的其他内容。

Generating a Build System

生成 build tree：

cmake -B build/ -S source/

最好不要使用无参数的 cmake。

chapter02 - cmake language

发表于： 2024-02-01 更新于： 2024-08-18 分类于： modern-cmake-for-cpp

Comments

使用 #
使用 #[=[ 和 #]=]（这种块注释是可以嵌套的）

如果 #[=[ 前面还有 #，那么块注释开始标志本身被注释了。后续的块注释结束标志被认为是普通的单行注释。

##[=[
message("I'm not commented")
#]=]

Command

命令名是大小写不敏感的，但是一般用小写。

命令不是表达式，没有返回值。

chapter02.a - 总结常用的 CMake 变量

发表于： 2024-02-01 更新于： 2024-08-18 分类于： modern-cmake-for-cpp

`PROJECT_*`

PROJECT_SOURCE_DIR
PROJECT_BINARY_DIR

不存在 PROJECT_LIST_DIR。

PROJECT_* 变量会随着 project 命令的出现而变更。

`CMAKE_*`

CMAKE_SOURCE_DIR
CMAKE_BINARY_DIR

也不存在 CMAKE_LIST_DIR。