type_index

发表于： 2023-08-12 更新于： 2025-01-05

字数： 2588 阅读：≈ 6分钟

例子：bustub lab 0

可以和 📌vtable 联系起来看。

dynamic_cast 何时可以使用？

使用要求：① 目标是类的指针或引用 ② 目标类含虚表

我最初在 bustub lab 0 的实现中使用了向下转型的操作：在 Get 操作中，当一个 TrieNode 有值的时候，就尝试向下转型为 TrieNodeWithValue<T>，这一步仍然正确就说明类型也正确，可以取出返回。

在 Compiler Explorer 中实现相似操作：

struct A {};

template <class T>
struct B: A { T t; };

int main() {
    auto *p = new A;
    if (auto pb = dynamic_cast<B<int> *>(p)) {}
}

结果说 A 没有多态性：

<source>: In function 'int main()':
<source>:12:19:error:cannot 'dynamic_cast' 'p' (of type 'struct A*') to type 'struct B<int>*' (source type is not polymorphic)
   12 |     if (auto pb =dynamic_cast<B<int> *>(p)) {}
      |^~~~~~~~~~~~~~~~~~~~~~~~~
ASM generation compiler returned: 1
<source>: In function 'int main()':
<source>:12:19:error:cannot 'dynamic_cast' 'p' (of type 'struct A*') to type 'struct B<int>*' (source type is not polymorphic)
   12 |     if (auto pb =dynamic_cast<B<int> *>(p)) {}
      |^~~~~~~~~~~~~~~~~~~~~~~~~
Execution build compiler returned: 1

这是因为 A 没有虚表。想要使用 dynamic_cast 功能，需要给 A 加虚表，也就是说只要一个虚函数就可以了：

struct A {
    virtual ~A() = default;
};

template <class T>
struct B: A { T t; };

int main() {
    auto *p = new A;
    if (auto pb = dynamic_cast<B<int> *>(p)) {}
}

上面的代码就能够正常编译。

尝试用 std::type_index 代替 dynamic_cast？

可以用 std::type_index 来唯一标记数据的类型，该结构占用 8 个字节，且能够作为 key 放入容器中。std::type_index 能够判断一个类是不是目标类型，而 dynamic_cast 能够判断一个类是不是目标类型或其子类，所以前者的判断更加精确。

在 bustub 的例子中，由于 TrieNodeWithValue<T> 在不同的 T 之间没有继承关系，所以 dynamic_cast 成功后可以确定是相同的类型，能满足功能的要求。

💡 理论上，如果 B 继承于 A，那么从一个包含 B 对象的结点获取 A 类型的指针也应该是可能的。但是 TrieNodeWithValue<B> 并不继承于 TrieNodeWithValue<A>，这使得涉及向上转型的 Get 操作无法在给定的代码中实现，从而本质上是完成了“精确获取”。

💡 std::type_index 需要结合 typeid 来使用，std::type_index 和 typeid 都有唯一性，但是 typeid(expr_or_type).name() 不一定有。详见 cppreference。

typeid 对类型的判断也有动态性：

可以看到 typeid 对 *p2 的类型判断是真实的。

另外注意到动态类型转换的 dynamic_cast 实现第 5 步对类型做了相等判断，对于类型完全一致的转换情况，能够直接返回结果。所以在 bustub 的场景下 dynamic_cast 只有转换失败时才比 typeid + type_index + dynamic_cast 的判断慢。

我简单测了一下用 dynamic_cast 和 type_index 方法的时间花销：

benchmark 代码：本地 GCC 10.2 -O0

#include <iostream>
#include <typeindex>
#include <chrono>
#include <vector>

struct A { virtual ~A() = default; };

template <class T>
struct B: A { T t; };

int main() {
    const int loop_count = 10000;
    std::vector<decltype(std::chrono::steady_clock::now())> time_points;
    time_points.reserve(16); // do not resize later
    auto tick = [&]() { time_points.push_back(std::chrono::steady_clock::now()); };

    auto *p1 = new B<int>;
    A *p2 = p1;

    tick();
    for (int i = 0; i < loop_count; ++i) {
        if (auto p3 = dynamic_cast<B<int> *>(p2)) {
            ; // nop
        }
    }
    tick();
    for (int i = 0; i < loop_count; ++i) {
        if (auto p3 = dynamic_cast<B<double> *>(p2)) {
            ; // nop
        }
    }
    tick();
    for (int i = 0; i < loop_count; ++i) {
        B<int> *p3;
        if (std::type_index(typeid(B<int>)) == std::type_index(typeid(*p2))
                && (p3 = dynamic_cast<B<int> *>(p2))) {
            ; // nop
        }
    }
    tick();
    for (int i = 0; i < loop_count; ++i) {
        B<double> *p3;
        if (std::type_index(typeid(B<double>)) == std::type_index(typeid(*p2))
                && (p3 = dynamic_cast<B<double> *>(p2))) {
            ; // nop
        }
    }
    tick();
    for (int i = 0; i < loop_count; ++i) {
        B<int> *p3;
        if (std::type_index(typeid(B<int>)) == std::type_index(typeid(*p2))
                && (p3 = reinterpret_cast<B<int> *>(p2))) {
            ; // nop
        }
    }
    tick();
    for (int i = 0; i < loop_count; ++i) {
        B<double> *p3;
        if (std::type_index(typeid(B<double>)) == std::type_index(typeid(*p2))
                && (p3 = reinterpret_cast<B<double> *>(p2))) {
            ; // nop
        }
    }
    tick();
    auto print_time = [&](int i, const char *tag) {
        auto d = time_points[i] - time_points[i-1];
        std::cout << tag << ": "
                  << std::chrono::duration_cast<std::chrono::microseconds>(d).count()
                  << " μs\n";
    };

    print_time(1, "[success] dynamic_cast");
    print_time(2, "[failure] dynamic_cast");
    print_time(3, "[success] type_index + dynamic_cast");
    print_time(4, "[failure] type_index + dynamic_cast");
    print_time(5, "[success] type_index + reinterpret_cast");
    print_time(6, "[failure] type_index + reinterpret_cast");
}

结果：

xxx@yyy:/tmp/v$ g++ main.cpp -O0 && ./a.out
[success] dynamic_cast: 73 μs
[failure] dynamic_cast: 128 μs
[success] type_index + dynamic_cast: 136 μs
[failure] type_index + dynamic_cast: 99 μs
[success] type_index + reinterpret_cast: 89 μs
[failure] type_index + reinterpret_cast: 97 μs

事实证明在 bustub + O0 优化的场景中：

reinterpret_cast 能够比 dynamic_cast 节约一点时间。
失败的场景下先检查 type_index 再转换会比 dynamic_cast 更快。
成功的场景下 dynamic_cast 快于 type_index 检查后转换。

由于怀疑 dynamic_cast 是用了优化选项编译的，而我的 type_index 检查没有用优化选项而吃亏，接下来看 -O2 优化的结果（这里不用 O3 优化是因为 O3 优化已经看出来了失败的分支是一定失败的，查类型的操作被优化删了）

benchmark 代码：本地 GCC 10.2 -O2

#include <iostream>
#include <typeindex>
#include <chrono>
#include <optional>
#include <vector>

struct A { virtual ~A() = default; };

template <class T>
struct B: A { T t; };

void doNotOptimize() {
    static volatile int a;
    a = 0;
}

int main() {
    const int loop_count = 10000;
    std::vector<decltype(std::chrono::steady_clock::now())> time_points;
    time_points.reserve(16); // do not resize later
    auto tick = [&]() { time_points.push_back(std::chrono::steady_clock::now()); };

    auto *p1 = new B<int>;
    A *p2 = p1;

    tick();
    for (int i = 0; i < loop_count; ++i) {
        if (auto p3 = dynamic_cast<B<int> *>(p2)) {
            doNotOptimize();
        }
    }
    tick();
    for (int i = 0; i < loop_count; ++i) {
        if (auto p3 = dynamic_cast<B<double> *>(p2)) {
            doNotOptimize();
        }
    }
    tick();
    for (int i = 0; i < loop_count; ++i) {
        B<int> *p3;
        if (std::type_index(typeid(B<int>)) == std::type_index(typeid(*p2))
                && (p3 = dynamic_cast<B<int> *>(p2))) {
            doNotOptimize();
        }
    }
    tick();
    for (int i = 0; i < loop_count; ++i) {
        B<double> *p3;
        if (std::type_index(typeid(B<double>)) == std::type_index(typeid(*p2))
                && (p3 = dynamic_cast<B<double> *>(p2))) {
            doNotOptimize();
        }
    }
    tick();
    for (int i = 0; i < loop_count; ++i) {
        B<int> *p3;
        if (std::type_index(typeid(B<int>)) == std::type_index(typeid(*p2))
                && (p3 = reinterpret_cast<B<int> *>(p2))) {
            doNotOptimize();
        }
    }
    tick();
    for (int i = 0; i < loop_count; ++i) {
        B<double> *p3;
        if (std::type_index(typeid(B<double>)) == std::type_index(typeid(*p2))
                && (p3 = reinterpret_cast<B<double> *>(p2))) {
            doNotOptimize();
        }
    }
    tick();
    auto print_time = [&](int i, const char *tag) {
        auto d = time_points[i] - time_points[i-1];
        std::cout << tag << ": "
                  << std::chrono::duration_cast<std::chrono::microseconds>(d).count()
                  << " μs\n";
    };

    print_time(1, "[success] dynamic_cast");
    print_time(2, "[failure] dynamic_cast");
    print_time(3, "[success] type_index + dynamic_cast");
    print_time(4, "[failure] type_index + dynamic_cast");
    print_time(5, "[success] type_index + reinterpret_cast");
    print_time(6, "[failure] type_index + reinterpret_cast");
}

结果：

xxx@yyy:/tmp/v$ g++ main.cpp -O2 && ./a.out
[success] dynamic_cast: 60 μs
[failure] dynamic_cast: 124 μs
[success] type_index + dynamic_cast: 67 μs
[failure] type_index + dynamic_cast: 23 μs
[success] type_index + reinterpret_cast: 7 μs
[failure] type_index + reinterpret_cast: 22 μs

现在 dynamic_cast 已经完全没有优势了。

`reinterpret_cast` 安全吗？

可以参考 p1839r5.pdf，截至2023年4月30日，用非字节手段访问对象表示的确是 UB。但是根据 cppreference，如果指针指向的确实是该同类型的对象，那么访问不算是 UB。对于指针之间的转换，由于 bustub 的 TrieNode 和 TrieNodeWithValue<T> 确有继承关系，所以也不是 UB。

对于 Lab 2 中遇到的通过零大小数组越界访问实现 B 树的情况，没有涉及到指针转换，所以不是 UB。alignment 也没有问题，因为零大小数组也是数组，其 alignment 和元素一致。

[2023年6月3日] 这里使用 static_cast 比较好，因为它可以修正指针偏移。reinterpret_cast 则只能适用于单继承的情况。下面就懒得提这个问题了。

type_index 的使用正确吗？

考虑到不同动态链接库中同一种类型可能有不同的 type_index，这一点又要斟酌。但是，dynamic_cast 也有同样的问题！所以如果 dynamic_cast 适用，type_index 当然也适用。
另一方面，这里借助了 typeid 来访问 type_index，能够正确处理指针动态转换后地址偏移的问题。

还想更快？存储 `type_index` 空间换时间

如果愿意多占用一些内存，可以在 TrieNodeWithValue 中加一个类型域，存储 std::type_index，就能通过比较 type_index 再 interpret_cast 来避免运行时类型查找得开销，见虚表和类型表的布局；而直接存储下来只需要访存一次）。对于空结点，使用 std::type_index(typeid(void)) 来初始化 type_index。当类型本身没必要加虚表的时候，这个 std::type_index 对象刚好就能替代掉虚表指针占用的 8 个字节的空间。

std::cout << typeid(void).name() << '\n'; 在 GCC 13.1 的输出是 "v\n"，这说明 void 也是有 type_info （typeid 表达式的类型）的。

另外，如果不需要抹去类型（毕竟存 index 则不必记住原来的类型），可以直接比较 typeid() 的返回值（写起来比创建 std::type_index 方便），因为 std::type_info 重载了比较操作符。

读 `std::type_info` 的开销

BaseObject *p = new DerivedObject;
// vtable_prefix = { whole_object, whole_type, origin }
// 以下每行读取内存一次
char *vptr = *(char **)p;
long offset = *(long *)(vptr - sizeof(long) * 2);
char *realvptr = *(char **)(p + offset);
// __class_type_info 继承于 std::type_info
__class_type_info *info = *(__class_type_info **)(realvptr - sizeof(long));

比较 `std::type_info` 的开销

删掉了一些条件编译宏，下面是一个版本的比较方式（也就是比较 std::type_info::name() 字符串，注意 _name 开头为 * 的话有特殊含义），可以看到动态比较 std::type_info 是比较字符串。

_GLIBCXX23_CONSTEXPR inline bool
type_info::operator==(const type_info& __arg) const _GLIBCXX_NOEXCEPT
{
    if (std::__is_constant_evaluated())
      return this == &__arg;

    if (__name == __arg.__name)
      return true;

    return __name[0] != '*' && __builtin_strcmp (__name, __arg.name()) == 0;
}

比较 `std::type_index` 的开销

std::type_index 包含了一个 std::type_info 的指针，比较运算符会转发到 std::type_info 的比较上，内联之后代价一样。

dynamic_cast 何时可以使用？

尝试用 std::type_index 代替 dynamic_cast？

reinterpret_cast 安全吗？

type_index 的使用正确吗？

还想更快？存储 type_index 空间换时间

读 std::type_info 的开销

比较 std::type_info 的开销

比较 std::type_index 的开销

`reinterpret_cast` 安全吗？

还想更快？存储 `type_index` 空间换时间

读 `std::type_info` 的开销

比较 `std::type_info` 的开销

比较 `std::type_index` 的开销