什么是string interning(字符串驻留)以及python中字符串的intern机制

Incomputer science, string interning is a method of storing only onecopy of each distinct string value, which must be immutable.
Interning strings makes some stringprocessing tasks more time- or space-efficient at the cost of requiring moretime when the string is created or interned. The distinct values are stored ina string intern pool. --引自维基百科

也就是说，值相同的字符串对象只会保存一份，是共用的，这也决定了字符串必须是不可变对象。想一想，就跟数值类型一样，相同的数值只要保存一份就行了，没必要用不同对象来区分。

python中的字符串采用了intern机制，会自动intern。

>>a = ‘kzc‘

>>b = ‘k‘+‘zc‘

>>id(a)

55704656

>>id(b)

55704656

可以看到，它们是同一个对象。

intern机制的好处是，需要值相同的字符串的时候（比如标识符），直接从池里拿来用，避免频繁的创建和销毁，提升效率，节约内存。缺点是，拼接字符串、对字符串修改之类的影响性能。因为是不可变的，所以对字符串修改不是inplace操作，要新建对象。这也是为什么拼接多字符串的时候不建议用+而用join()。join()是先计算出所有字符串的长度，然后一一拷贝，只new一次对象。

需要小心的坑，并不是所有的字符串都会采用intern机制。只包含下划线、数字、字母的字符串才会被intern。

>>a = ‘hello world‘

>>b = ‘hello world‘

>>id(a)

56400384

>>id(b)

56398336

这里因为有空格，所有没被intern。

但是为什么这么做呢？既然python内置函数intern()能显式对任意字符串进行intern。说明不是实现难度的问题。

答案在源码stringobject.h中的注释可以找到，

/* ... ... This is generally restricted tostrings that "looklike" Python identifiers, although the intern() builtincan be used to force interning of any string ... ... */

也就是说，只对那些看起来像是python标识符的进行intern。

下面看另外一个坑，

例1.

>>‘kz‘+‘c‘ is ‘kzc‘

True

例2.

>>s1 = ‘kz‘

>>s2 = ‘kzc‘

>>s1+‘c‘ is ‘kzc‘

False

为什么第二个栗子是False,只包含字母啊，不是应该被自动intern的么？

这是因为第一个栗子中，‘kz‘+‘c‘是在compile time求值的，被替换成了‘kzc‘.

而第二个栗子，s1+‘c‘是在run-time拼接的，导致没有被自动intern.

时间： 2024-10-06 20:46:18

什么是string interning(字符串驻留)以及python中字符串的intern机制

什么是string interning(字符串驻留)以及python中字符串的intern机制的相关文章

python中字符串链接的七种方式

Python中字符串查找效率比较

Python中字符串的使用

Python中字符串的操作

python中字符串的操作方法

python中字符串的几种表达方式（用什么方式表示字符串）

Python中字符串的有趣玩法

python中字符串拼接

Python中字符串的表示