32 线程取消

发表于： 2024-07-12 更新于： 2024-09-07 分类于： the-linux-programming-interface

字数： 2067 阅读：≈ 5分钟

API

#include <pthread.h>

int pthread_cancel(pthread_t thread);

线程取消状态和线程取消类型

它们分别可以用 pthread_setcancelstate 和 pthread_setcanceltype 来设置。

#include <pthread.h>

int pthread_setcancelstate(int state, int *oldstate);
int pthread_setcanceltype(int type, int *oldtype);

调用 `fork()` 或者 `exec()` 时线程的行为

某线程调用 fork() 时，子进程会继承当前线程的取消类型和状态。某线程调用 exec() 时，新程序主线程的取消类型和取消状态都会被重置。解释：

多线程程序调用 fork() 之后，新进程中只会留下一个线程，大概这个新线程的有些属性会从 fork() 的调用者线程继承。根据 man 手册：

The child process is created with a single thread—the one that called fork(). The entire virtual address space of the parent is replicated in the child, including the states of mutexes, condition variables, and other pthreads objects; the use of pthread_atfork(3) may be helpful for dealing with problems that this can cause.

这个表现应该和“Linux 中线程被实现为一种特殊的进程”有关系。

而调用 exec() 会重新加载进程映像，从而会终止所有已存在的线程。

线程取消状态

线程取消状态有两种，一种是可以取消（默认状态，PTHREAD_CANCEL_ENABLE），一种是不可以取消（PTHREAD_CANCEL_DISABLE）。如果线程不可以取消，那么尝试取消一个线程的另外一个线程就会被挂起，直到线程的取消状态被其他线程设置为可以取消。

线程取消类型

如果线程是可以取消的，那么线程取消类型会决定线程在取消时的表现。线程的取消类型有两种：

PTHREAD_CANCEL_ASYNCHRONOUS：可以在任何时间取消线程（也不一定会取消线程）。这种类型使用得比较少。
PTHREAD_CANCEL_DEFERED：取消请求保持挂起状态，直至到达取消点（cancellation point）。这是默认的行为。

取消点（Cancellation Points）

很多系统调用和函数都是取消点，它们大多是可能引起程序阻塞的函数。

SUSv3 规定了：

一些函数如果被实现，则必须是取消点。比如等待、读写等系统调用，还有 pthreads 的部分 API，完整列表略。
一些函数如果被实现，可以是取消点，也可以不是。比如 stdio 的读写函数，因为它们可能会调用系统调用（缓冲区写满或读尽）也可能不会，完整列表略。
其他函数如果被实现，都不能是取消点。

SUSv4 在必须的可取消点函数列表中增加了 openat()，并移除了函数 sigpause()（将其移至“可能的”取消点函数列表中）和函数 usleep()（已从标准中删除）。

无论是线程主动调用 pthread_exit() 终止，还是被 cancel，如果没有被 detach，都必须对它调用 pthread_join()，否则线程将成为僵尸线程。取消线程并不会主动去回收线程，可能是因为统计线程信息的并不是发起线程取消请求的线程，或者取消线程并不是以 PTHREAD_CANCEL_DEFERED 的方式同步等待其终止的。

因此，pthread_cancel() 的调用者很有可能接着发起 pthread_join()。

用 `pthread_testcancel()` 来创建取消点

如果线程执行的代码中没有取消点怎么办（比如计算密集型程序）？这个时候我们可以用 pthread_testcancel() 来创建取消点，它被实现为一个可以成为取消点的，没有其他作用的函数。此函数运行时，如果线程允许取消，而且已经有等待的取消请求（这也就是“到下一个取消点时取消”的含义），则会终止线程。

如果代码中没有取消点，又希望能够被其他线程请求终止，就可以时不时地调用 pthread_testcancel() 来创建取消点。

清理函数

由于线程可能随时会被其他线程取消（而不是主动调用 pthread_exit() 或者通过从执行程序中返回）而结束，所以在取消时可能还有一些残存的资源需要释放（比如 pthreads 的互斥量）。为此，有必要支持自定义的清理函数（类比：C++ 的 RAII 也要解决类似的问题，比如抛出异常导致程序流程终止）。

#include <pthread.h>

void pthread_cleanup_push(void (*routine)(void *), void *arg);
void pthread_cleanup_pop(int execute);

其中 pthread_cleanup_push() 在线程的清理函数栈中加入一条新的记录，在线程结束时清理栈上的清理函数会逐一运行。而 pthread_cleanup_pop() 会从线程的清理函数栈中撤下记录，并由参数 execute 来决定其是否运行（可能是因为 C 语言在 C99 标准之前不支持 bool 类型，所以参数接口上用的是整型）。

实际上，清理函数只会在被其他线程取消、或者 pthread_exit() 的时候被自动调用，如果线程通过从线程执行函数中返回而正常终止，则清理栈中的清理函数不会自动执行。

在编码时，pthread_cleanup_push() 和 pthread_cleanup_pop() 是对应使用的：

#include <pthread.h>
#include <stdio.h>
#include <unistd.h>
#include <stdlib.h>

void print_and_free_int(void *arg) {
    int *a = (int *)arg;
    printf("%d\n", *a);
    free(a);
}

int *new_int(int a) {
    int *p = malloc(sizeof(int));
    *p = a;
    return p;
}

int main() {
    pthread_cleanup_push(print_and_free_int, new_int(6));
    pthread_cleanup_pop(1); // 如果注释掉这一行，则代码不能编译，因为句法结构上是错误的
}
// 程序会打印数字 6

这两个函数被允许实现为宏。Linux 甚至用宏的技巧保证了只 push 而不 pop 会导致代码无法编译（也就是强制两个宏必须配对使用）。

查阅 man 手册

需要了解哪些函数是取消点，从而利用 pthread_cleanup_* API 来添加清理动作。这些信息可以用 man pthreads 来查到。以下列举了几个可以从 pthreads(7) 手册页面查到的函数分类信息：

Thread-safe functions
Async-cancel-safe functions
Cancellation points

有点奇怪的是 cancellation 和 cancelation 这两种写法同时出现在了 man 手册中。网页上查到的手册全是 cancelation，而我本地手册的标题是 cancellation，其他地方都写的是 cancelation。可能是我本地的 pthread 版本比较新。

异步取消线程（很少使用）

异步取消线程很少是有用的，虽然清理函数依然会被执行，但是我们不知道程序会在哪里被取消，因此为了（尽可能完美地）保护资源，会将代码写的很复杂（就像在 C++ 中每次函数调用都用 try-catch 包裹一样）。一般不应该给可以异步取消的线程分配任何资源，这类线程能做的工作也就非常有限，比如纯计算。

API