Cuckoo Hashing

Cuckoo Hashing

Description

One of the most fundamental data structure problems is the dictionary problem: given a set D of words you want to be able to quickly determine if any given query string q is present in the dictionary D or not. Hashing is a well-known solution for the problem. The idea is to create a function h : Σ* → [0..n-1] from all strings to the integer range 0, 1, .., n-1, i.e. you describe a fast deterministic program which takes a string as input and outputs an integer between 0 and n-1. Next you allocate an empty hash table T of size n and for each word w in D, you set T[h(w)] = w. Thus, given a query string q, you only need to calculate h(q) and see if T[h(q)] equals q, to determine if q is in the dictionary. Seems simple enough, but aren‘t we forgetting something? Of course, what if two words in D map to the same location in the table? This phenomenon, called collision, happens fairly often (remember the Birthday paradox: in a class of 24 pupils there is more than 50% chance that two of them share birthday). On average you will only be able to put roughly √n-sized dictionaries into the table without getting collisions, quite poor space usage!

A stronger variant is Cuckoo Hashing. The idea is to use two hash functions h1 and h2. Thus each string maps to two positions in the table. A query string q is now handled as follows: you compute both h1(q) and h2(q), and if T[h1(q)] = q, or T[h2(q)] = q, you conclude that q is in D. The name "Cuckoo Hashing" stems from the process of creating the table. Initially you have an empty table. You iterate over the words d in D, and insert them one by one. If T[h1(d)] is free, you set T[h1(d)] = d. Otherwise if T[h2(d)] is free, you set T[h2(d)] = d. If both are occupied however, just like the cuckoo with other birds‘ eggs, you evict the word r in T[h1(d)] and set T[h1(d)] = d. Next you put r back into the table in its alternative place (and if that entry was already occupied you evict that word and move it to its alternative place, and so on). Of course, we may end up in an infinite loop here, in which case we need to rebuild the table with other choices of hash functions. The good news is that this will not happen with great probability even if D contains up to n/2 words!

Input

On the first line of input is a single positive integer 1 ≤ t ≤ 50 specifying the number of test cases to follow. Each test case begins with two positive integers 1 ≤ m ≤ n ≤ 10000 on a line of itself, m telling the number of words in the dictionary and n the size of the hash table in the test case. Next follow m lines of which the ith describes the ith word di in the dictionary D by two non negative integers h1(di) and h2(di) less than n giving the two hash function values of the word di. The two values may be identical.

Output

For each test case there should be exactly one line of output either containing the string "successful hashing" if it is possible to insert all words in the given order into the table, or the string "rehash necessary" if it is impossible.

Sample Input

2
3 3
0 1
1 2
2 0
5 6
2 3
3 1
1 2
5 1
2 5

Sample Output

successful hashing
rehash necessary

裸2SAT

相同值的位置表示为 !A or !B 即可

#include <cstdio>
#include <cstring>
#include <iostream>
#include <vector>
#define M 40005
using namespace std;
int all,be[20005],n,m,x,y;
int dfn[M],low[M],instack[M],belong[M],stack[M],stak,curr,num;
int e[M],ne[M],ee[M];
vector<int> vec[10005];

void add(int x,int y){
    e[all]=y;
    ee[all]=x;
    ne[all]=be[x];
    be[x]=all++;
}
void tarjan(int x){
    instack[x]=1;
    stack[++stak]=x;
    dfn[x]=low[x]=++curr;
    for(int j=be[x];j!=-1;j=ne[j])
        if(!dfn[e[j]]){
            tarjan(e[j]);
            if(low[x]>low[e[j]]) low[x]=low[e[j]];
        }else if(instack[e[j]]&&low[x]>low[e[j]])
            low[x]=low[e[j]];
    if(dfn[x]==low[x]){
        int j;
        ++num;
        do{
            j=stack[stak--];
            instack[j]=0;
            belong[j]=num;
        }while(j!=x);
    }
}
int solve(){
    curr=stak=num=0;
    memset(dfn,0,sizeof(dfn));
    memset(low,0,sizeof(low));
    memset(instack,0,sizeof(instack));
    for(int i=0;i<2*n;i++)
        if(!dfn[i]) tarjan(i);
    bool flag=0;
    for(int i=0;i<n;i++)
        if(belong[2*i]==belong[2*i+1]){
            flag=1;
            break;
        }
    return flag;
}
int main()
{
    int tt;
    scanf("%d",&tt);
    while(tt--)
    {
        for(int i=0; i<=10000; i++)
            vec[i].clear();
        all=0;
        memset(be,-1,sizeof(be));
        scanf("%d%d",&n,&m);
        for(int i=0; i<n; i++)
        {
            scanf("%d%d",&x,&y);
            for(vector<int>::iterator it=vec[x].begin(); it!=vec[x].end(); it++)
            {
                add(*it,2*i+1);
                add(2*i,(*it)^1);
            }
            for(vector<int>::iterator it=vec[y].begin(); it!=vec[y].end(); it++)
            {
                add(*it,2*i);
                add(2*i+1,(*it)^1);
            }
            vec[x].push_back(2*i);
            vec[y].push_back(2*i+1);
        }
        // for(int i=0; i<all; i++)
        //     printf("%d %d\n",ee[i],e[i]);
        if(!solve()) printf("successful hashing\n");
            else printf("rehash necessary\n");
    }
    return 0;
}

Cuckoo Hashing

时间: 2024-10-07 21:55:59

Cuckoo Hashing的相关文章

Cuckoo for Hashing(hash)

Problem B:Cuckoo for HashingAn integer hash table is a data structure that supports insert, delete and lookup of integer values inconstant time. Traditional hash structures consist of an array (the hash table) of some size n, and ahash function f(x)

Cuckoo Filter:设计与实现

Cuckoo Filter:设计与实现 对于海量数据处理业务,我们通常需要一个索引数据结构,用来帮助查询,快速判断数据记录是否存在,这种数据结构通常又叫过滤器(filter).考虑这样一个场景,上网的时候需要在浏览器上输入URL,这时浏览器需要去判断这是否一个恶意的网站,它将对本地缓存的成千上万的URL索引进行过滤,如果不存在,就放行,如果(可能)存在,则向远程服务端发起验证请求,并回馈客户端给出警告. 索引的存储又分为有序和无序,前者使用关联式容器,比如B树,后者使用哈希算法.这两类算法各有优

Cuckoo for Hashing_双哈希表

问题 B: Cuckoo for Hashing 时间限制: 1 Sec  内存限制: 64 MB提交: 24  解决: 12[提交][状态][讨论版] 题目描述 An integer hash table is a data structure that supports insert, delete and lookup of integer values in constant time. Traditional hash structures consist of an array (t

哈希函数和哈希表综述 (转)

哈希表及哈希函数研究综述 摘要 随着信息化水平的不断提高,数据已经取代计算成为了信息计算的中心,对存储的需求不断提高信息量呈现爆炸式增长趋势,存储已经成为急需提高的瓶颈.哈希表作为海量信息存储的有效方式,本文详细介绍了哈希表的设计.冲突解决方案以及动态哈希表.另外针对哈希函数在相似性匹配.图片检索.分布式缓存和密码学等领域的应用做了简短得介绍 哈希经过这么多年的发展,出现了大量高性能的哈希函数和哈希表.本文通过介绍各种不同的哈希函数的设计原理以及不同的哈希表实现,旨在帮助读者在实际应用中,根据问

Memcached哈希性能优化(八)——总结报告

转自:http://m.blog.csdn.net/blog/hzwfz1989/39120005 Memcached哈希性能优化报告 一. Memcached分析 这两个月一直在memcached优化和找工作之间忙着,一边复习一边优化改代码还真是个让人觉得难以忘记的夏天.做这个项目确实收获了很多,不管是对Linux的系统的认识,还是对memcached的认识都比以前更近一步,另外后面由于添加分块hash,替换LRU算法和更改hash算法对源代码进行修改,一不小心就把原来的代码的测试改跪了,用g

一致性哈希算法(consistent hashing)(转)

原文链接:每天进步一点点——五分钟理解一致性哈希算法(consistent hashing) 一致性哈希算法在1997年由麻省理工学院提出的一种分布式哈希(DHT)实现算法,设计目标是为了解决因特网中的热点(Hot spot)问题,初衷和CARP十分类似.一致性哈希修正了CARP使用的简 单哈希算法带来的问题,使得分布式哈希(DHT)可以在P2P环境中真正得到应用. 一致性hash算法提出了在动态变化的Cache环境中,判定哈希算法好坏的四个定义: 1.平衡性(Balance):平衡性是指哈希的

Hashing图像检索源码及数据库总结

下面的这份哈希算法小结来源于本周的周报,原本并没有打算要贴出来的,不过,考虑到这些资源属于关注利用哈希算法进行大规模图像搜索的各位看官应该很有用,所以好东西本小子就不私藏了.本资源汇总最主要的收录原则是原作者主页上是否提供了源代码,为了每种方法的资料尽可能完整,本小子会尽可能的除提供源码下载地址外,还会给出PDF文章的链接.项目主页,slide等. 对哈希方法重新进行调研,右图是找到的提供有部分源码的哈希方法,这其中包含了比较经典的哈希方法,比如e2lsh.mih,同时也包含有最近几年一直到13

Go语言实现一致性哈希(Consistent Hashing)算法

一致性哈希可用于解决服务器均衡问题. 用Golang简单实现了下,并加入了权重.可采用合适的权重配合算法使用. package main //一致性哈希(Consistent Hashing) //author: Xiong Chuan Liang //date: 2015-2-20 import ( "fmt" "hash/crc32" "sort" "strconv" "sync" ) const DE

Cuckoo的配置与使用Ubuntu + VirtualBox + windows Xp SP3

cuckoo简介:Cuckoo基于虚拟机技术,使用中央控制系统和模块设计,结合python的自动化特征,已经是颇为自动化的恶意软件行为研究环境.独到的蜜罐网络研发的技巧,让Cuckoo可以轻而易举的进行URL分析.网络通讯分析.程序分析.pdf分析.个人推荐使用debian或ubuntu主机安装virtualbox当作Cuckoo Host,WinXP做guest. 运行环境:Ubuntu 14.04 LTS Desktop(64Bit),VirtualBox,Windows Xp SP3 参考