📌 非自动变量初始化

发表于： 2023-08-12 更新于： 2025-03-18 分类于： inside-the-cpp-object-model

字数： 2547 阅读：≈ 6分钟

C++ 和 C `thread_local` 的区别

C++ 支持使用非常量表达式对全局或静态变量初始化 ¹。对于 static local ² / thread_local 变量而言，这项功能需要在访问前检查变量是否已经完成初始化，thread_local 初始化不需要线程间同步，而 static local 变量的访问过程需要线程间的同步（__cxa_guard_acquire 和 __cxa_guard_release）。
C++ 的 thread_local 变量在函数作用域中自动具有 static 属性 ³，而 C 要手动加。在 C 语言中，函数中的 thread_local 必须和 extern 或者 static 之一一起使用，例子为 https://godbolt.org/z/eKz71xh7a 。
C 的 thread_local 在 C23 之前是个宏。

从代价上来看 C++ 的几种变量初始化

首先，不需要函数初始化的在编译期间就能完成工作，没有代价。所以以下讨论的都是通过函数或构造函数来初始化的变量。
其次，函数内 (static) thread_local 变量只需要在使用前检查一下，构造和使用都不用同步，代价很小。函数内 static 变量的构造和使用则需要线程之间同步。
函数外定义的普通变量和 thread_local 变量都不需要任何同步就能在静态初始化阶段完成初始化，使用时也不需要检查。
函数外定义的 inline 变量（C++17）在使用时不需要同步，但是在初始化的时候要检查是否已经初始化完成（为此有个 guard variable 标记）。见 https://godbolt.org/z/hYMjdbsxj ，这可能是因为 inline 变量可能被多个地方使用，每个地方都要提防重复初始化。

全局 (和 static) 变量

最外围定义域定义的变量，可以具有 static 属性也可以没有。

不支持非常量初始化表达式（C 模型）

编译器会抱怨：initializer element is not constant。

支持非常量初始化表达式，但只对对象生效（Cfront 1.0 模型）

在用户程序真正运行前插入一段静态变量初始化代码。

Cfront 早期对 class object 支持了非常量的静态初始化，但标准类型（整数、指针等）的静态支持和 C 一样。

考虑下面的代码：

struct Point {
    double x;
    double y;
};

struct Point point = {2.4, 0.1};

此代码是合法的 C 或 C++ 代码。它的数据值直接写到 data section 里，不需要静态初始化代码（也就是不需要用非常量表达式对变量初始化）。

修改这段代码之后就不能作为 C 语言编译了，但是可以作为 C++ 编译：

struct Point {
    double x;
    double y;
};

struct Point point = {2.4, 0.1};
struct Point another_point = point;

初始化 another_point 需要用到 point，这产生了全局变量之间的依赖关系，在编译器看来不是常量表达式，所以这个过程会被放到静态变量初始化代码中进行。

所有类型支持非常量初始化表达式（现在的 C++ 模型）

后来 C++ 对所有类型都支持非常量静态初始化了。这可能是支持虚基类的副产品，因为虚基类和子类之间的指针转换需要知道 offset，而 offset 只能在编译全部完成之后才能确定。

可以参考：

class Base {};
class VirtualBase: public virtual Base {};

// 不需要静态初始化代码
// Base *pbase = (Base *)0x1000;

// 从 VirtualBase* 转换成 Base*，需要静态初始化代码
// 如果把虚继承改成具体继承，也不再会需要静态初始化代码
Base *pbase2 = (VirtualBase *)0x1000;

GPT 对于书中说法的解释

Consider the following example:

class Base {};
class VirtualBase: public virtual Base {};

class Derived1: public VirtualBase {};
class Derived2: public VirtualBase {};

In this example, we have two separate derived classes, Derived1 and Derived2, both inheriting from the same virtual base class VirtualBase.

Now, when the compiler encounters the definition of Derived1 or Derived2 in a given translation unit, it doesn’t have information about other derived classes. As a result, the exact offset of the VirtualBase subobject within Derived1 or Derived2 cannot be determined at that point.

To handle this situation, the compiler employs static initialization. It generates code that initializes static variables associated with the virtual base class. These static variables store the offset information needed to access the virtual base class subobject. The initialization of these static variables happens at runtime before any objects of the derived classes are created.

By utilizing static initialization, the compiler ensures that the correct offset information is available at runtime, allowing proper access to the virtual base class subobject.

局部 static 变量

这里只讨论 C++，C 没有非常量初始化的功能。

C++ 保证函数中的静态对象只在其对应函数被调用时初始化，而且仅初始化一次。如果是常量初始化表达式，则在编译时可以初始化，函数中也不需要同步措施；但如果是非常量初始化表达式，则有同步代价！

#include <cstdio>

struct Echo {
    Echo(const char *s) {
        puts(s);
    }
    static Echo *static_member;
};

// 初始化只能放到这里，如果放到上面 Echo 就不是完整对象不能创建
// 前面声明不能加 inline 因为 inline 会让声明变成定义
// 此处定义不能加 static 因为不能多次指定存储类型
inline Echo *Echo::static_member = new Echo {"Echo::static_member"};

void test_echo() {
    static Echo hidden("local static");
}

Echo global_echo("global");
static Echo static_echo("static echo");

struct Foo {
    static inline Echo *static_member = new Echo {"Foo::static_member"};
};

int main() {
    // 试试调用 0/1/2 次
    test_echo();
    test_echo();
}

打印结果：

Echo::static_member
global
static echo
Foo::static_member
local static

上面的代码中如果注释掉 main 函数中的两次调用，则 hidden("local static") 不会被初始化！看下面的汇编代码，有了同步措施后复杂了不少：

test_echo():
        push    rbp
        mov     rbp, rsp
        push    r12
        push    rbx
        movzx   eax, BYTE PTR guard variable for test_echo()::hidden[rip]
        test    al, al
        sete    al
        test    al, al
        je      .L7
        mov     edi, OFFSET FLAT:guard variable for test_echo()::hidden
        call    __cxa_guard_acquire
        test    eax, eax
        setne   al
        test    al, al
        je      .L7
        mov     r12d, 0
        mov     esi, OFFSET FLAT:.LC0
        mov     edi, OFFSET FLAT:_ZZ9test_echovE6hidden
        call    Echo::Echo(char const*) [complete object constructor]
        mov     edi, OFFSET FLAT:guard variable for test_echo()::hidden
        call    __cxa_guard_release
        jmp     .L7
        mov     rbx, rax
        test    r12b, r12b
        jne     .L5
        mov     edi, OFFSET FLAT:guard variable for test_echo()::hidden
        call    __cxa_guard_abort
.L5:
        mov     rax, rbx
        mov     rdi, rax
        call    _Unwind_Resume
.L7:
        nop
        pop     rbx
        pop     r12
        pop     rbp
        ret

全局 / 局部 thread_local 变量

C++11 thread_local

访问变量需要使用一个和线程相关的段寄存器和固定的变量偏移得到变量的地址，这样不同线程访问的变量地址肯定不同。

对于全局的 thread_local 变量，每次访问 thread_local 变量 i 会先调用 TLS wrapper function for i，然后该函数调用 TLS init function for i（在 Compiler Explorer 过滤的汇编中是 __tls_init）确保变量只被初始化一次，最后返回变量的地址。

https://godbolt.org/z/Y8a7s4Msz

#include <iostream>
#include <thread>

thread_local int i = time(0);

int main() {
    i = 7;
    std::thread t{[&](){ i = 9; }};
    t.join();
    std::cout << i << "\n";
}

对于局部的 thread_local 变量：

https://godbolt.org/z/xdePfv3eM

#include <iostream>
#include <thread>

// 因为 main 入口肯定只有一个，所以尝试把 thread_local 放到其他函数中试试
// 实际上和把 thread_local 变量放在 main 中的效果一样
int &get_i() {
    thread_local int i = time(0);
    return i;
}

int main() {
    get_i() = 7;
    std::thread t{[&](){ get_i() = 9; }};
    t.join();
    std::cout << get_i() << "\n";
}

get_i() 中也有确保变量仅仅初始化一次的逻辑。

get_i()::i:
        .zero   4
guard variable for get_i()::i:
        .zero   8
get_i():
        push    rbp
        mov     rbp, rsp
        mov     rax, QWORD PTR fs:0
        add     rax, OFFSET FLAT:guard variable for get_i()::i@tpoff
        movzx   eax, BYTE PTR [rax]
        test    al, al
        jne     .L11
        mov     edi, 0
        call    time
        mov     DWORD PTR fs:get_i()::i@tpoff, eax
        mov     rax, QWORD PTR fs:0
        add     rax, OFFSET FLAT:guard variable for get_i()::i@tpoff
        mov     BYTE PTR [rax], 1
.L11:
        mov     rax, QWORD PTR fs:0
        add     rax, OFFSET FLAT:get_i()::i@tpoff
        pop     rbp
        ret

所以，全局和局部的 thread_local 变量访问的代价是一样的。只不过如果 thread_local 是局部的，检查是否需要初始化的逻辑就直接放在函数中；如果是全局的，这个逻辑用一个生成的函数实现而已。thread_local static 和 static 局部变量相比，

接下来尝试把 time(0) 这种初始化表达式换成一个常量，比如说 2，则访问 thread_local 时不会有同步措施，只是创建线程的时候要复制一份变量而已。

C11 _Thread_local

C11 的头文件 threads.h 中有下面定义：

#define thread_local _Thread_local

这就和以前的 bool 宏需要引入一样。在 C23 之后，thread_local 就是一个单独的关键字了。

C11 的 thread_local 是不具备 static 属性的，在函数中使用时要用 thread_local static，而 C++ 中的 thread_local 在函数中自动具备 static 属性，加不加 static 都行。

C++ 标准：

When thread_local is applied to a variable of block scope the storage-class-specifier static is implied if it does not appear explicitly.

2023 年 6 月 15 日：C11 的 threads.h 在 Windows 上没有实现，而且也没有提供像 C++ 的 mutex 一样的同步功能，写 C 语言用线程还是 pthreads.h / Win32 API 会比较好一点。

像 C 一样用常量表达式初始化就不会有访问开销。 ↩︎
Static local 指的是在函数作用域中被 static 修饰的局部变量。 ↩︎
对于全局变量来说，无论是 C++ 还是 C，static 和 thread_local 是正交的，static 表示是否对外链接，thread_local 表示是否线程本地。 ↩︎

C++ 和 C thread_local 的区别