php扩展开发-哈希表

什么是哈希表呢？哈希表在数据结构中也叫散列表。是根据键名经过hash函数计算后，映射到表中的一个位置，来直接访问记录，加快了访问速度。在理想情况下，哈希表的操作时间复杂度为O(1)。数据项可以在一个与哈希表长度无关的时间内，计算出一个值hash(key)，在固定时间内定位到一个桶(bucket，表示哈希表的一个位置)，主要时间消耗在于哈希函数计算和桶的定位。

在分析PHP中HashTable实现原理之前，先介绍一下相关的基本概念：

如下图例子，希望通过人名检索一个数据，键名通过哈希函数，得到指向bucket的指针，最后访问真实的bucket。

键名(Key)：在哈希函数转换前，数据的标识。

桶(Bucket)：在哈希表中，真正保存数据的容器。

哈希函数(Hash Function)：将Key通过哈希函数，得到一个指向bucket的指针。MD5，SHA-1是我们在业务中常用的哈希函数。

哈希冲突(Hash Collision)：两个不同的Key，经过哈希函数，得到同一个bucket的指针。

现在我们来看一下PHP中的哈希表结构

 1 //Zend/zend_hash.h
 2
 3  typedef struct _hashtable {
 4         uint nTableSize;                    //哈希表的长度，不是元素个数
 5         uint nTableMask;                  //哈希表的掩码，设置为nTableSize-1
 6         uint nNumOfElements;          //哈希表实际元素个数
 7         ulong nNextFreeElement;      //指向下一个空元素位置
 8         Bucket *pInternalPointer;       //用于遍历哈希表的内部指针
 9         Bucket *pListHead;               //哈希表队列的头部
10         Bucket *pListTail;                 //哈希表队列的尾部
11         Bucket **arBuckets;               //哈希表存储的元素数组
12         dtor_func_t pDestructor;          //哈希表的元素析构函数指针
13         zend_bool persistent;              //是否是持久保存，用于pmalloc的参数，可以持久存储在内存中
14         unsigned char nApplyCount;     // zend_hash_apply的次数，用来限制嵌套遍历的层数，限制为3层
15         zend_bool bApplyProtection;     //是否开启嵌套遍历保护
16 #if ZEND_DEBUG
17         int inconsistent; //debug字段，查看哈希表的操作记录
18 #endif
19 } HashTable;
20
21  typedef struct bucket {
22         ulong h;                               //数组索引的哈希值
23         uint nKeyLength;                  //索引数组为0，关联数组为key的长度
24         void *pData;                         //元素内容的指针
25         void *pDataPtr;                    // 如果是指针大小的数据，用pDataPtr直接存储，pData指向pDataPtr
26         struct bucket *pListNext;     //哈希链表中下一个元素
27         struct bucket *pListLast;     //哈希链表中上一个元素
28         struct bucket *pNext;          //解决哈希冲突，变为双向链表，双向链表的下一个元素
29         struct bucket *pLast;          //解决哈希冲突，变为双向链表，双向链表的上一个元素
30         const char *arKey;             //最后一个元素key的名称
31 } Bucket;

哈希表的常用操作函数，内核使用宏定义来方便我们的操作

//初始化哈希表#define zend_hash_init(ht, nSize, pHashFunction, pDestructor, persistent)  _zend_hash_init((ht), (nSize), (pDestructor), (persistent) ZEND_FILE_LINE_CC) ht 指向哈希表的指针，通常我们可以这样定义哈希表，HashTable *ht;ALLOC_HASHTABLE(ht);nSize 哈希表的数量，哈希表总是以2N次递增的，所以实际的数量会大于你传递的数量pHashFunction 这是早期用到的一个参数，用来定义一个hash函数，现在全部改成默认的DJBX33A算法计算哈希值，只是为了兼容才保留了参数，我们传NULL即可pDestructor 是一个回调函数，当我们删除或修改hashtable表中的一个元素时便会调用改函数persistent 是一个标识位，是否在内存中永久保存ht指向的哈希表。可以使用1或0两个值，显然1表示永久保存

//更新哈希表的关联数组值#define zend_hash_update(ht, arKey, nKeyLength, pData, nDataSize, pDest) \ _zend_hash_add_or_update(ht, arKey, nKeyLength, pData, nDataSize, pDest, HASH_UPDATE ZEND_FILE_LINE_CC) ht 同上arKey 字符索引的key值nKeyLength key长度pData 字符数组保存的值nDataSize sizeof(pData)的值pDest 如果不为NULL，则*pDest=pData;

//插入哈希表的关联数组数据 
#define zend_hash_add(ht, arKey, nKeyLength, pData, nDataSize, pDest) \
_zend_hash_add_or_update(ht, arKey, nKeyLength, pData, nDataSize, pDest, HASH_ADD ZEND_FILE_LINE_CC)  参数同上
//更新索引数组#define zend_hash_index_update(ht, h, pData, nDataSize, pDest) \
_zend_hash_index_update_or_next_insert(ht, h, pData, nDataSize, pDest, HASH_UPDATE ZEND_FILE_LINE_CC)h 数字索引值其余参数同上//插入索引数组
#define zend_hash_next_index_insert(ht, pData, nDataSize, pDest) \
_zend_hash_index_update_or_next_insert(ht, 0, pData, nDataSize, pDest, HASH_NEXT_INSERT ZEND_FILE_LINE_CC)参数同上

哈希表的ＡＰＩ

int zend_hash_init(HashTable* ht, uint size, hash_func_t hash, dtor_func_t destructor, zend_bool persistent)
int zend_hash_add(HashTable* ht, const char* key, uint klen, void* data, uint dlen, void** dest)
int zend_hash_update(HashTable* ht, const char* key, uint klen, void* data, uint dlen, void** dest)
int zend_hash_find(HashTable* ht, const char* key, uint klen, void** data)
zend_bool zend_hash_exists(HashTable* ht, const char* key, uint klen)
int zend_hash_del(HashTable* ht, const char* key, uint klen)
int zend_hash_index_update(HashTable* ht, ulong index, void* data, uint dsize, void** dest)
int zend_hash_index_del(HashTable* ht, ulong index)
int zend_hash_index_find(HashTable* ht, ulong index, void** data)
int zend_hash_index_exists(HashTable* ht, ulong index)ulong zend_hash_next_free_element(HashTable* ht)

哈希表的遍历API

HashTable Traversal API
int zend_hash_internal_pointer_reset(HashTable* ht)
resets the internal pointer of ht to the start
int zend_hash_internal_pointer_reset_ex(HashTable* ht, HashPosition position)
sets position the the start of ht
int zend_hash_get_current_data(HashTable* ht, void* data)
gets the data at the current position in ht, data should be cast to void**, ie: (void**) &data
int zend_hash_get_current_data_ex(HashTable* ht, void* data, HashPosition position)
sets data to the data at position in ht
int zend_hash_get_current_key(HashTable* ht, void* data, char**key, uint klen, ulong index, zend_bool duplicate)
sets key, klen, and index from the key information at the current position. The possible return values HASH_KEY_IS_STRING and HASH_KEY_IS_LONG are indicative of the kind of key found at the current posision.
int zend_hash_get_current_key_ex(HashTable* ht, void* data, char**key, uint klen, ulong index, zend_bool duplicate, HashPosition position)
sets key, klen, and index from the key information at position. The possible return values HASH_KEY_IS_STRING and HASH_KEY_IS_LONG are indicative of the kind of key found at position.
int zend_hash_move_forward(HashTable* ht)
moves the internal pointer of ht to the next entry in ht
int zend_hash_move_forward_ex(HashTable* ht, HashPosition position)
moves position to the next entry in ht

通过一个例子来使用上面的API函数

PHP_FUNCTION(myext_example_hashtable);//php_myext.h申明

PHP_FE(myext_example_hashtable, NULL)//函数注册

PHP_FUNCTION(myext_example_hashtable){
    php_printf("init\n");
    HashTable *myht;
    ALLOC_HASHTABLE(myht);
    int nSize = 100;
    zend_hash_init(myht, nSize, NULL, NULL, 0);//哈希函数和析构函数都为NULL
    char *key1 = "key1";
    int nKeyLength = sizeof(key1);
    zval * value1;
    MAKE_STD_ZVAL(value1);
    ZVAL_STRING(value1,"value1",0);
    zval * value2;
    MAKE_STD_ZVAL(value2);
    ZVAL_STRING(value2,"value2",0);
    int ret = zend_hash_add(myht, key1, nKeyLength+1, &value1, sizeof(zval*),NULL);
    printf("zend_hash_add,ret=>%d\n",ret);
    ret = zend_hash_add(myht, key1, nKeyLength+1, &value2, sizeof(zval*),NULL);
    printf("add exist key , zend_hash_add,ret=>%d\n",ret);
    ret = zend_hash_update(myht, key1, nKeyLength+1, &value2, sizeof(zval*),NULL);
    printf("update exist key , zend_hash_add,ret=>%d\n",ret);
    ret = zend_hash_index_update(myht,0,&value2,sizeof(zval*),NULL);
    printf("zend_hash_index_update,ret=>%d\n",ret);
    ret = zend_hash_next_index_insert(myht,&value2,sizeof(zval*),NULL);
    printf("zend_hash_next_index_insert,ret=>%d\n",ret);

    HashPosition position;
    zval **data = NULL;

    php_printf("\n");
    for (zend_hash_internal_pointer_reset_ex(myht, &position);
            zend_hash_get_current_data_ex(myht, (void**) &data, &position) == SUCCESS;
            zend_hash_move_forward_ex(myht, &position)) {

        /* by now we have data set and can use Z_ macros for accessing type and variable data */

        char *key = NULL;
        uint  klen;
        ulong index;

        if (zend_hash_get_current_key_ex(myht, &key, &klen, &index, 0, &position) == HASH_KEY_IS_STRING) {
            /* the key is a string, key and klen will be set */
            php_printf("string key %s =>",key);
        } else {
            /* we assume the key to be long, index will be set */
            php_printf("index key %d =>",index);
        }
        if (Z_TYPE_PP(data) != IS_STRING) {
            convert_to_long(*data);
        }
        PHPWRITE(Z_STRVAL_PP(data), Z_STRLEN_PP(data));
        php_printf("\n");
    }

    FREE_ZVAL(value1);
    FREE_ZVAL(value2);
    zend_hash_destroy(myht);
    FREE_HASHTABLE(myht);
    RETURN_NULL();
}

还可以使用内核把哈希表封装成数组的方式使用，也就是zval类型里面的IS_ARRAY

array_init(arrval);

add_assoc_long(zval *arrval, char *key, long lval);
add_index_long(zval *arrval, ulong idx, long lval);
add_next_index_long(zval *arrval, long lval);

//add_assoc_*系列函数：
add_assoc_null(zval *aval, char *key);
add_assoc_bool(zval *aval, char *key, zend_bool bval);
add_assoc_long(zval *aval, char *key, long lval);
add_assoc_double(zval *aval, char *key, double dval);
add_assoc_string(zval *aval, char *key, char *strval, int dup);
add_assoc_stringl(zval *aval, char *key,char *strval, uint strlen, int dup);
add_assoc_zval(zval *aval, char *key, zval *value);

//备注：其实这些函数都是宏，都是对add_assoc_*_ex函数的封装。

//add_index_*系列函数：
ZEND_API int add_index_long     (zval *arg, ulong idx, long n);
ZEND_API int add_index_null     (zval *arg, ulong idx           );
ZEND_API int add_index_bool     (zval *arg, ulong idx, int b    );
ZEND_API int add_index_resource (zval *arg, ulong idx, int r    );
ZEND_API int add_index_double   (zval *arg, ulong idx, double d);
ZEND_API int add_index_string   (zval *arg, ulong idx, const char *str, int duplicate);
ZEND_API int add_index_stringl  (zval *arg, ulong idx, const char *str, uint length, int duplicate);
ZEND_API int add_index_zval     (zval *arg, ulong index, zval *value);

//add_next_index_long函数：
ZEND_API int add_next_index_long        (zval *arg, long n  );
ZEND_API int add_next_index_null        (zval *arg          );
ZEND_API int add_next_index_bool        (zval *arg, int b   );
ZEND_API int add_next_index_resource    (zval *arg, int r   );
ZEND_API int add_next_index_double      (zval *arg, double d);
ZEND_API int add_next_index_string      (zval *arg, const char *str, int duplicate);
ZEND_API int add_next_index_stringl     (zval *arg, const char *str, uint length, int duplicate);
ZEND_API int add_next_index_zval        (zval *arg, zval *value);

时间： 2024-10-18 01:49:36

php扩展开发-哈希表的相关文章

哈希表的理解

哈希表是种数据结构,它可以提供快速的插入操作和查找操作.第一次接触哈希表时,它的优点多得让人难以置信.不论哈希表中有多少数据,插入和删除(有时包括侧除)只需要接近常量的时间即0(1)的时间级.实际上,这只需要几条机器指令. 对哈希表的使用者一一人来说,这是一瞬间的事.哈希表运算得非常快,在计算机程序中,如果需要在一秒种内查找上千条记录通常使用哈希表(例如拼写检查器)哈希表的速度明显比树快,树的操作通常需要O(N)的时间级.哈希表不仅速度快,编程实现也相对容易. 哈希表也有一些缺点它是基于数组的,

【转】哈希表

memcached源码分析-----哈希表基本操作以及扩容过程

转载请注明出处:http://blog.csdn.net/luotuo44/article/details/42773231 温馨提示:本文用到了一些可以在启动memcached设置的全局变量.关于这些全局变量的含义可以参考<memcached启动参数详解>.对于这些全局变量,处理方式就像<如何阅读memcached源代码>所说的那样直接取其默认值. assoc.c文件里面的代码是构造一个哈希表.memcached快的一个原因是使用了哈希表.现在就来看一下memca

哈希表--扩展数组

pre-situation: 当哈希表变得太满时候.一个选择是扩展数组. java中数组有固定大小.而且不能扩展.编程时.只能另外创建一个更新的更大的数组.然后把旧数组的所有内容插入新数组当中. 注意: 哈希函数根据数组大小计算给定数据项的位置. 所以这些数据项不能再放在新数组中和原有数组相同的位置上. 因此不能简单地从一个数组向另一个数组拷贝数据. 扩展后的数组容量通常是原来的两倍.实际上.因为数组容量应该是一个质数. 所以新数组要比两倍的容量多一点.

哈希表之bkdrhash算法解析及扩展

BKDRHASH是一种字符哈希算法,像BKDRHash,APHash,DJBHash,JSHash,RSHash,SDBMHash,PJWHash,ELFHash等等,这些都是比较经典的,通过http://blog.csdn.net/wanglx_/article/details/40300363(字符串哈希函数)这篇文章,我们可知道,BKDRHash是比较好的一个获取哈希值的方法.下面就讲解这个BKDRHash函数是如何推导实现的. 当我看到BKDRHash的代码时,不禁就疑惑了,这里面有个常

Java数据结构和算法（十三）——哈希表

Hash表也称散列表,也有直接译作哈希表,Hash表是一种根据关键字值(key - value)而直接进行访问的数据结构.它基于数组,通过把关键字映射到数组的某个下标来加快查找速度,但是又和数组.链表.树等数据结构不同,在这些数据结构中查找某个关键字,通常要遍历整个数据结构,也就是O(N)的时间级,但是对于哈希表来说,只是O(1)的时间级. 注意,这里有个重要的问题就是如何把关键字转换为数组的下标,这个转换的函数称为哈希函数(也称散列函数),转换的过程称为哈希化. 1.哈希函数的引入大家都用过

哈希表（hash）详解

哈希表结构讲解: 哈希表(Hash table,也叫散列表),是根据关键码值(Key value)而直接进行访问的数据结构.也就是说,它通过把关键码值映射到表中一个位置来访问记录,以加快查找的速度.这个映射函数叫做散列函数,存放记录的数组叫做散列表. 记录的存储位置 = function(关键字) 这里的对应关系function称为散列函数,又称为哈希(Hash函数),采用散列技术将记录存储在一块连续的存储空间中,这块连续存储空间称为散列表或哈希表(Hash table). 哈希表hashta

Linux内核中的哈希表

Author:tiger-john Time:2012-12-20mail:[email protected]Blog:http://blog.csdn.net/tigerjb/article/details/8450995 转载请注明出处. 前言: 1.基本概念: 散列表(Hash table,也叫哈希表),是根据关键码值(Key value)而直接进行访问的数据结构.也就是说,它通过把关键码值映射到表中一个位置来访问记录,以加快查找的速度.这个映射函数叫做散列函数,存放记录的数组叫做散列表.

浅谈算法和数据结构: 十一哈希表

在前面的系列文章中,依次介绍了基于无序列表的顺序查找,基于有序数组的二分查找,平衡查找树,以及红黑树,下图是他们在平均以及最差情况下的时间复杂度: 可以看到在时间复杂度上,红黑树在平均情况下插入,查找以及删除上都达到了lgN的时间复杂度. 那么有没有查找效率更高的数据结构呢,答案就是本文接下来要介绍了散列表,也叫哈希表(Hash Table) 什么是哈希表哈希表就是一种以键-值(key-indexed) 存储数据的结构,我们只要输入待查找的值即key,即可查找到其对应的值. 哈希的思路很简单