redis bitcount variable-precision swar算法

花了不到一周的时间看完了一本reids设计与实现的书,感觉整体的设计有些地方的确很巧妙,各个结构之间联系的非常紧密,但是很简单,逻辑性的没有太多,但是学到了一个bitcount计数1的方法比较巧妙,记录下来

看了一个老外的介绍的很详细

转载过来

OK, let‘s go through the code line by line:

Line 1:

i = i - ((i >> 1) & 0x55555555);

First of all, the significance of the constant 0x55555555 is that, written using the Java / GCC style binary literal notation),

0x55555555 = 0b01010101010101010101010101010101

That is, all its odd-numbered bits (counting the lowest bit as bit 1 = odd) are 1, and all the even-numbered bits are 0.

The expression ((i >> 1) & 0x55555555) thus shifts the bits of i right by one, and then sets all the even-numbered bits to zero. (Equivalently, we could‘ve first set all the odd-numbered bits of i to zero with & 0xAAAAAAAA and then shifted the result right by one bit.) For convenience, let‘s call this intermediate value j.

What happens when we subtract this j from the original i? Well, let‘s see what would happen if ihad only two bits:

    i           j         i - j
----------------------------------
0 = 0b00    0 = 0b00    0 = 0b00
1 = 0b01    0 = 0b00    1 = 0b01
2 = 0b10    1 = 0b01    1 = 0b01
3 = 0b11    1 = 0b01    2 = 0b10

Hey! We‘ve managed to count the bits of our two-bit number!

OK, but what if i has more than two bits set? In fact, it‘s pretty easy to check that the lowest two bits of i - j will still be given by the table above, and so will the third and fourth bits, and the fifth and sixth bits, and so and. In particular:

  • despite the >> 1, the lowest two bits of i - j are not affected by the third or higher bits of i, since they‘ll be masked out of j by the & 0x55555555; and
  • since the lowest two bits of j can never have a greater numerical value than those of i, the subtraction will never borrow from the third bit of i: thus, the lowest two bits of i also cannot affect the third or higher bits of i - j.

In fact, by repeating the same argument, we can see that the calculation on this line, in effect, applies the table above to each of the 16 two-bit blocks in i in parallel. That is, after executing this line, the lowest two bits of the new value of i will now contain the number of bits set among the corresponding bits in the original value of i, and so will the next two bits, and so on.

Line 2:

i = (i & 0x33333333) + ((i >> 2) & 0x33333333);

Compared to the first line, this one‘s quite simple. First, note that

0x33333333 = 0b00110011001100110011001100110011

Thus, i & 0x33333333 takes the two-bit counts calculated above and throws away every second one of them, while (i >> 2) & 0x33333333 does the same after shifting i right by two bits. Then we add the results together.

Thus, in effect, what this line does is take the bitcounts of the lowest two and the second-lowest two bits of the original input, computed on the previous line, and add them together to give the bitcount of the lowest four bits of the input. And, again, it does this in parallel for all the 8 four-bit blocks (= hex digits) of the input.

Line 3:

return (((i + (i >> 4)) & 0x0F0F0F0F) * 0x01010101) >> 24;

OK, what‘s going on here?

Well, first of all, (i + (i >> 4)) & 0x0F0F0F0F does exactly the same as the previous line, except it adds the adjacent four-bit bitcounts together to give the bitcounts of each eight-bit block (i.e. byte) of the input. (Here, unlike on the previous line, we can get away with moving the & outside the addition, since we know that the eight-bit bitcount can never exceed 8, and therefore will fit inside four bits without overflowing.)

Now we have a 32-bit number consisting of four 8-bit bytes, each byte holding the number of 1-bit in that byte of the original input. (Let‘s call these bytes ABC and D.) So what happens when we multiply this value (let‘s call it k) by 0x01010101?

Well, since 0x01010101 = (1 << 24) + (1 << 16) + (1 << 8) + 1, we have:

k * 0x01010101 = (k << 24) + (k << 16) + (k << 8) + k

Thus, the highest byte of the result ends up being the sum of:

  • its original value, due to the k term, plus
  • the value of the next lower byte, due to the k << 8 term, plus
  • the value of the second lower byte, due to the k << 16 term, plus
  • the value of the fourth and lowest byte, due to the k << 24 term.

(In general, there could also be carries from lower bytes, but since we know the value of each byte is at most 8, we know the addition will never overflow and create a carry.)

That is, the highest byte of k * 0x01010101 ends up being the sum of the bitcounts of all the bytes of the input, i.e. the total bitcount of the 32-bit input number. The final >> 24 then simply shifts this value down from the highest byte to the lowest.

Ps. This code could easily be extended to 64-bit integers, simply by changing the 0x01010101 to 0x0101010101010101 and the >> 24 to >> 56. Indeed, the same method would even work for 128-bit integers; 256 bits would require adding one extra shift / add / mask step, however, since the number 256 no longer quite fits into an 8-bit byte.

其重要就是对于32位的数据看做一个整体,首先以其中2位为一组计算出1的个数,再以4位为一组,计算出1的个数,再以8位为一组,计算出1的个数

其实到这里1的个数就是4个8位的之和,可以通过计算求解

比如0xbbbbbbbb *0x01010101= 0xbbbbbbbb *(1<<24+1<<16+1<<8+1)

最后32位中1的个数的总数就被保存在了高8位中,此时只需要>>24就可以求出来了

当然在实际中,我们可以一次计算出多个32位,比如计算出4*32 那样的时间效率相对于遍历就节约了128倍,相对于查表法也快了4*4倍

时间: 2024-10-18 09:01:53

redis bitcount variable-precision swar算法的相关文章

Redis rdb文件CRC64校验算法 Java实现

查看RDB文件结构,发现最后的8字节是CRC64校验算得,从文件头开始直到8字节校验码前的FF结束码(含),经过CRC64校验计算发现,貌似最后的8字节是小端模式实现的. 参考redis的crc64实现的代码,点击查看 Java代码如下: 1 package com.jadic.utils; 2 3 /** 4 * @author Jadic 5 * @created 2014-5-15 6 */ 7 public class CRC64 { 8 private static final lon

探究redis和memcached的 LRU算法--------redis的LRU的实现

一直对这redis和memcached的两个开源缓存系统的LRU算法感兴趣.今天就打算总结一下这两个LRU算法的实现和区别. 首先要知道什么是LRU算法:LRU是Least Recently Used 近期最少使用算法.相关的资料网上一大堆.http://en.wikipedia.org/wiki/Cache_algorithms#LRU   redis的六种策略 rewriteConfigEnumOption(state,"maxmemory-policy",server.maxme

7.redis 集群模式的工作原理能说一下么?在集群模式下,redis 的 key 是如何寻址的?分布式寻址都有哪些算法?了解一致性 hash 算法吗?

作者:中华石杉 面试题 redis 集群模式的工作原理能说一下么?在集群模式下,redis 的 key 是如何寻址的?分布式寻址都有哪些算法?了解一致性 hash 算法吗? 面试官心理分析 在前几年,redis 如果要搞几个节点,每个节点存储一部分的数据,得借助一些中间件来实现,比如说有 codis,或者 twemproxy,都有.有一些 redis 中间件,你读写 redis 中间件,redis 中间件负责将你的数据分布式存储在多台机器上的 redis 实例中. 这两年,redis 不断在发展

Redis的逐出算法

Redis使用内存存储数据,在执行每一个命令前,会调用freeMemoryIfNeeded()检测内存是否充足.如果内存不满足新加入数据的最低存储要求, redis要临时删除一些数据为当前指令清理存储空间.清理数据的策略称为逐出算法.注意:逐出数据的过程不是100%能够清理出足够的可使用的内存空间,如果不成功则反复执行.当对所有数据尝试完毕后,如果不能达到内存清理的要求,将出现错误信息. (error) OOM command not allowed when used memory >'max

redis命令详解与使用场景举例——String

APPEND key value 如果 key 已经存在并且是一个字符串, APPEND 命令将 value 追加到 key 原来的值的末尾. 如果 key 不存在, APPEND 就简单地将给定 key 设为 value ,就像执行 SET key value 一样. 可用版本: 2.0.0+ 时间复杂度: 平摊O(1) 返回值: 追加 value 之后, key 中字符串的长度. 对不存在的 key 执行 APPEND redis> EXISTS myphone # 确保 myphone 不

Redis的conf文件以及内存调优

Redis的配置文件详解: 原文参考:https://www.jianshu.com/p/16013dbb137e https://juejin.im/post/5d674ac2e51d4557ca7fdd70 # Redis configuration file example. # # Note that in order to read the configuration file, Redis must be # started with the file path as first a

Redis常用命令详细介绍

一.字符串 字符串键是Redis最基本的键值对类型,将一个单独的键和一个单独的值关联起来.通过字符串键,不仅可以存储和读取字符串,如果输入能被解释为整数和浮点数,还能执行自增或自减操作. 1.SET:设置字符串键的值 命令 SET key value [EX seconds|PX milliseconds] [NX|XX] 效果 为字符串键设置值,如果字符串键不存在,创建这个字符串键:如果已经存在,直接更新值.EX和PX选项设置键的生存时间(以秒或毫秒为单位).当生存时间消耗殆尽后,这个键就会被

Redis 基础数据结构与对象

Redis用到的底层数据结构有:简单动态字符串.双端链表.字典.压缩列表.整数集合.跳跃表等,Redis并没有直接使用这些数据结构来实现键值对数据库,而是基于这些数据结构创建了一个对象系统,这个系统包括字符串对象.列表对象.哈希对象.集合对象和有序结合对象共5种类型的对象. 1 简单动态字符串 redis自定义了简单动态字符串数据结构(sds),并将其作为默认字符串表示. struct sdshdr { unsigned int len; unsigned int free; char buf[

redis集群原理

redis是单线程,但是一般的作为缓存使用的话,redis足够了,因为它的读写速度太快了.   官方的一个简单测试: 测试完成了50个并发执行100000个请求. 设置和获取的值是一个256字节字符串. 结果:读的速度是110000次/s,写的速度是81000次/s 在这么快的读写速度下,对于一般程序来说足够用了,但是对于访问量特别大的网站来说,还是稍有不足.那么,如何提升redis的性能呢?看标题就知道了,搭建集群. 3.0版本之前 3.0版本之前的redis是不支持集群的,我们的徐子睿老师说