Python pycache

发表于： 2023-08-12 更新于： 2024-04-08

工作原理

https://peps.python.org/pep-3147/#python-behavior

每次 import 的时候，解释器检查当前目录下的 __pycache__ 文件夹，读取和 Python 版本对应的 cache，然后读取 cache 中库的修改时间（cache 文件中记录着源码的修改时间，并不是 cache 文件本身的修改时间）。将这个时间和源码的时间比较，如果这个时间不存在，或者比源码的时间新，就加载 .pyc 而不必加载源码。

问题：在 Windows 和 WSL 上测试，解释器总是能够发现最新的源码，好像没有读取字节码一样。原因是除了时间之外，pycache 还存储了源文件的长度（验证后发现确实是这样，修改代码时既不改变文件时间又不改变长度，就能复用 cache）。

[link](https://docs.python.org/3/reference/import.html###%205.4.7.%20Cached%20bytecode%20invalidation%5D(https://docs.python.org/3/reference/import.html###%205.4.7.%20Cached%20bytecode%20invalidation%5D(https://docs.python.org/3/reference/import.html#cached-bytecode-invalidation%20%22Permalink%20to%20this%20headline%22)
Before Python loads cached bytecode from a .pyc file, it checks whether the cache is up-to-date with the source .py file. By default, Python does this by storing the source’s last-modified timestamp and size in the cache file when writing it. At runtime, the import system then validates the cache file by checking the stored metadata in the cache file against the source’s metadata.
Python 3.2 引入了 __pycache__，之前是在文件旁边直接生成对应的 .pyc 文件（加了 c 后缀），而不组织在文件夹中。

Python slots

发表于： 2023-08-12

定义了 __slots__ 静态属性的类没有 __dict__ 属性。而且只能添加存在于槽中的属性（可以少加，不能多加）。

class Point:
    __slots__ = ['x', 'y']
    def __init__(self, x, y):
        self.x = x
        self.y = y

p = Point(1, 2)
print(p.__dict__)  # 输出: AttributeError: 'Point' object has no attribute '__dict__'

槽最大的意义在于节省空间：共享 __slots__，每个对象没有自己的 __dict__。节省了字典的空间。对于属性较多的对象，使用槽访问速度可能会下降。

>>> import sys
>>> class Foo:
...    def __init__(self, x): self.x = x
...
>>> sys.getsizeof(Foo(1))
56
>>> sys.getsizeof(Foo(1)) + sys.getsizeof(Foo(1).__dict__)
336
>>> class Bar:
...    __slots__ = 'x', # 这里表示的是一个元组
...    def __init__(self, x): self.x = x
...
>>> sys.getsizeof(Bar(1))
40

有些属性可能既不存在于 __slots__ 中，又不存在于 __dict__ 中。这些属性是方法，能从 dir(obj) 中看到。而且标注了 @property 的方法能够直接像其他属性一样访问。

Python sys vs os

发表于： 2023-08-12

sys 可以访问解释器（运行时功能）和系统特定功能（系统有关功能）。os 提供了操作系统上的统一抽象（系统无关功能）。

解释器功能：比如 sys.argv 访问命令行参数，sys.exit() 退出程序，sys.path 是 python 包的查询路径，sys.stdin/sys.stdout/sys.stderr 分别表示三个系统管道。
系统有关功能：比如 sys.getwindowsversion 可以得到 windows 的大小版本号。sys.platform 可以获得操作系统平台。sys.version 是 python 安装信息。
系统无关功能：操作文件系统，path 分隔符，环境变量等。

Python 转置矩阵

发表于： 2023-08-12 更新于： 2024-09-07

transpose = lambda listA: list(list(t) for t in zip(*listA))

range-for

发表于： 2023-08-12

以前容易出现引用悬挂的问题。在新的语言标准中得到了改进。

C++11 只是简单的语法糖，只能保证表达式返回值被临时变量接受并保留到循环结束：(https://stackoverflow.com/a/51440883)

{
    auto&& __range = f()[5]; // (*)
    auto __begin = __range.begin(); // not exactly, but close enough
    auto __end = __range.end();     // in C++17, these types can be different
    for (; __begin != __end; ++__begin) {
        auto e = *__begin;
        // rest of body
    }
}

C++20 range-for 生命周期规则一样，不过可以在里面创建临时对象，避免后面引用过程中的悬挂：

#include <iostream>

int main() {
    for (auto lst = {1, 2, 5, 1, 71}; auto i : lst) {
        std::cout << i << '\n';
    }
}

Rust 程序设计语言第一印象

发表于： 2023-08-12 更新于： 2024-04-08

首先看这个：

C++中在 C++14 之后可以使用 ' 来分隔数字字面量。而 Java 和 Rust 在设计阶段很早的时候就支持了用 _ 来分隔数字。

个人感受：

处处充斥强制移动语义。相关的是 Drop Trait，而基本类型还实现了 Copy Trait。（所有权和 C++ RAII 想要解决的问题相似）
默认定义是常量，包括引用默认是常引用。（和 C++ 相反）
允许同一个作用域内 shadow，试图把变量名当成真正的标签来用（类似 Python）。
错误处理用 expect，比 try-catch 简洁。
内置元组和 range、if 条件不需要加括号。
很多实用包都得用 crate，在标准库中没提供……

sizeof

发表于： 2023-08-12

下面几个表达式相等：

sizeof(T&)
sizeof(T&&)
sizeof(T)

也就是说 sizeof 会去掉引用。忘记出处了，twitter 上有人提过一个 quiz，什么类型 T 能满足 struct {T x;} 和 T 的 sizeof 结果不相等，引用类型就满足这样的情况。

std::lower_bound/upper_bound

发表于： 2023-08-12 更新于： 2024-05-05

class Solution {
   public:
    vector<int> searchRange(vector<int>& nums, int target) {
        int n = (int)nums.size();
        // lower_bound 的可能实现
        auto lowerbound = [&](auto &&less) {
            int lo = 0, hi = n;
            while (lo < hi) {
                int mid = (lo + hi) / 2;
                if (less(nums[mid], target)) lo = mid + 1;
                else                         hi = mid;
            }
            return lo;
        };
        // std::lower_bound
        int left = lowerbound(std::less<>{});
        // std::upper_bound
        int right = lowerbound(std::less_equal<>{});
        return (left < n && nums[left] == target) ? 
                vector<int>{left, right-1} : 
                vector<int>{-1, -1};
    }
};

Strict Aliasing

发表于： 2023-08-12 更新于： 2024-04-08

文章：What is Strict Aliasing and Why do we Care? (github.com)

所谓 Strict Aliasing 就是指为 aliasing 设定条件，使得编译器大多数场景下认为代码没有 aliasing，从而可以激进优化代码。

#include <iostream>

int foo( float *f, int *i ) {
    *i = 1;
    *f = 0.f;

   return *i;
}

int main() {
    int x = 0;

    std::cout << x << "\n";   // Expect 0
    x = foo(reinterpret_cast<float*>(&x), &x);
    std::cout << x << "\n";   // Expect 0?
}

上述代码在 gcc 13.1 -O2 下编译，第二行打印结果为 1。编译器认为 f 和 i 指针必定不重合，所以直接返回了 1。

什么时候 Aliasing 是允许的？

比较复杂，而且 C 和 C++ 的要求不同。在 alias 可用时，编译器不会像上面那样激进优化。下面只提到了一部分规则。

Three-way Comparison =

发表于： 2023-08-12 更新于： 2024-04-08

含义

C++ 20 加入了 <=> 操作符，该操作符会按布局顺序比较成员，比较时会递归使用 <=> 操作符。对于类类型来说，即便默认的 <=> 操作符可用，也必须显式声明为 = default，否则不能使用。

如果需要在 B 类中包含 A 类，同时声明 B 类的默认 <=> 操作符以便比较 B 类对象，则必须同时在 A 类中声明默认的 <=> 操作符。

Three-way comparison operator 的返回值可以是偏序、弱序或强序。体现在 C++ 语言中是：

偏序：弱序或者不可比较。
弱序：a 和 b 的关系只能是小于、等于、大于之一。比较相等性时常用 !(a < b) && !(b < a) 的形式。相等的值可以拥有不同的身份。
强序：a 和 b 的关系只能是小于、等于、大于之一。可以直接用 == 比较相等性，相等则意味着完全一致，可以相互替代。