Akka cluster gossip protocol

General background:

Type of gossip:

  • Push gossip: A node that has new information initiates gossip message to some other random node: Message usually contains full state.Propagation fast when less than half of the nodes are infected
  • Pull gossip:A node that does not have new information initiate gossip and request information from some other random node: Request message contains digest only and some other node respond with requestor’s outdated value. Propagation fast when more than half of the nodes are infected
  • Push pull gossip: After pull gossip, requestor send back values for which it has later version.


Message size VS precision:

  • Precise reconciliation: Each data has its own version and part of the data is sent in gossip message. (Random chosen, Data with oldest version, Data with latest version)
  • ScuttlebuttReconciliation:Maintain a global version on each node and compare global version first. Fall back to precise reconciliation only if global version is different

Akka Gossip data structure: contains a set of members, a vector clock, and a gossip overview


Members:

Each member has the following possible status: Joining, Weaklyup, Up, Leaving, Down, Exiting, Removed and itsunique address, unique address is different from host/port address in that it has a unique uuid, so that a host/port joining the cluster twice can be distinguished

Members are changed when ClusetCoreDaemon received joining or down message or convergence happens


Vector clock:

Keep a treeMap of node to version mapping.

Vector clock comparison: can have 4 results: SAME, BEFORE, AFTER, CONCURRENT. If node is the same, compare version.

Vector clock merge: take the latestversion of each node

Vector clock version increment:  whenever current node detects any cluster member change (join, leaving, downing, quarantined, converges,unreachable etc) and gossip is updated (ClusterCoreDaemon. updateLatestGossip),the vector clock will add current node, which makes current node’s vector clock version increase

Gossip overview:

Contains a “seen” set, which contains unique address that have seen the gossip message and replied

Akka ClusterCoreDaemon interacts with Gossip:

Choose gossip target: 

  • Must be in the member list.
  • Not down or exiting
  • Reachable

If a node in the above set is not seen by the current node,it takes priority (probability of 0.8). Otherwise, just a random node isselected from the set

Probability of sending message to unseen node reduced graduallywith larger cluster size (>400) to avoid sending too many messages to asingle node

Gossip message is sent to a group of at fixed interval(gossip tick), to speed up convergence, if less than  of the members have seenthe gossip message, the interval is reduced

Gossip message contains the from address, to address and aserialized gossip object includes seen set, vector clock etc

Process gossip message:

Received gossip messages are ignored, if

  • To address does not match current node.
  • The sender is notreachable from current node.
  • The sender is not a local member.
  • Remote gossip member does not contain current node.

Compare local and remote vector clock

  • If same:  merge remote/current seen nodes.  send response back if remote node has not seenthis node
  • If current vector clock is newer:  take the current gossip, send response back
  • If remote vector clock is newer:  take the remote gossip, send response back ifremote node has not seen this node
  • If conflicting:  prune conflict of both current/remote gossipand merge them, send response back

Always add current node to the seen set of the latest gossip

Received gossip messages are dropped if they are enqueuedtoo long in mailbox to prevent overwhelming the current node.

Gossip convergence:

Conditions:

  • No members are unreachable exceptthose with exiting or down status
  • No members are not in the seen set

After convergence, only gossip status containing the version vector clock is sent. Unless there is a change to the member status (ClusterCoreDaemon.gossipStatusTo ClusterCoreDaemon.receiveGossipStatus),it falls back to normal gossip. (ClusterCoreDaemon.gossipTo  ClusterCoreDaemon.receiveGossip)

Also, if a node is in the seen set, it will send gossip status only.

Reference:

http://blog.sina.com.cn/s/blog_912389e50100z0dt.html

时间: 2024-10-12 17:15:06

Akka cluster gossip protocol的相关文章

Akka Cluster简介与基本环境搭建

??akka集群是高容错.去中心化.不存在单点故障以及不存在单点瓶颈的集群.它使用gossip协议通信以及具备故障自动检测功能. Gossip收敛 ??集群中每一个节点被其他节点监督(默认的最大数量为5).集群中的节点互相监督着,某节点所监督的状态也正在被其他监督着.通过gossip协议,节点向其他节点传递自己所见节点的最新状态(Up.Joining等等),同时节点也在接收来自其他节点的信息,这些信息包括哪些节点以及这些节点对应的状态,并这些节点加入到自己的seen表里去,表示自己已经看见了这些

akka cluster make node as unreachable 问题

如果确认cluster 端没有问题,就应该检查一下akka的 application.conf. 在akka cluster中,主要通过故障检测机制(Failure Detector)来看集群中各个节点是否处理可用状态. 集群中每个节点都会监控(最大5个)集群中其余的节点是否处于正常状态,当其中一个节点A检测到另一个节点B处理不可到达(unreachable)状态,它会通过gossip协议通知其余节点,其余节点将B节点标记为unreachable.如果你在application.conf中配置了

Serf:Gossip Protocol

Serf使用Gossip Protocol来广播消息到集群中.本文介绍这个内部协议的细节.gossip协议基于“SWIM: Scalable Weakly-consistent Infection-style Process Group Membership Protocol”,有一写小的适配,很大程度上增加了传播速度和收敛速率. SWIM Protocol Overview Serf以加入一个已存在的集群或者启动一个新集群开始.如果启动一个新集群,其他节点则会加入它.为了加入现有集群,新节点必

akka cluster sharding source code 学习 (1/5) 替身模式

为了使一个项目支持集群,自己学习使用了 akka cluster 并在项目中实施了,从此,生活就变得有些痛苦.再配上 apache 做反向代理和负载均衡,debug 起来不要太酸爽.直到现在,我还对 akka cluster 输出的 log 不是很熟悉,目前网络上 akka cluster 的信息还比较少,想深入了解这东西的话,还是要自己读 source code.前几天,雪球那帮人说 akka 不推荐使用,有很多坑,这给我提了个醒,目前我对 akka 的理解是远远不够的,需要深入学习. akk

akka cluster 初体验

cluster 配置 akka { actor { provider = "akka.cluster.ClusterActorRefProvider" } remote { log-remote-lifecycle-events = off enabled-transports = ["akka.remote.netty.tcp"] netty.tcp { hostname = "127.0.0.1" port = 0 } } cluster {

akka cluster sharding

cluster sharding 的目的在于提供一个框架,方便实现 DDD,虽然我至今也没搞明白 DDD 到底适用于是什么场合,但是 cluster sharding 却是我目前在做的一个 project 扩展到集群上非常需要的工具. sharding 要做这么几件事 1. 对于每一个 entity,创建一个 actor.该 entity 有 Id 作为唯一标示.该 entity 的所有消息都由此 actor 来处理 2. 该 actor 在一段时间内不工作时,会超时并 kill self 3.

akka cluster sharding source code 学习 (1/5) handle off

一旦 shard coordinator(相当于分布式系统的 zookeeper) 启动,它就会启动一个定时器,每隔一定的时间尝试平衡一下集群中各个节点的负载,平衡的办法是把那些负载较重的 actor 移动到负载较轻的节点上.在这一点上,我以前的理解有误,我以为 shardRegion 是移动的最小单位. val rebalanceTask = context.system.scheduler.schedule(rebalanceInterval, rebalanceInterval, self

Akka源码分析-Cluster-Distributed Publish Subscribe in Cluster

在ClusterClient源码分析中,我们知道,他是依托于"Distributed Publish Subscribe in Cluster"来实现消息的转发的,那本文就来分析一下Pub/Sub是如何实现的. 还记得之前分析Cluster源码的文章吗?其实Cluster只是把集群内各个节点的信息通过gossip协议公布出来,并把节点的信息分发出来.但各个actor的地址还是需要开发者自行获取或设计的,比如我要跟worker通信,那就需要知道这个actor在哪个节点,通过actorPa

Akka(10): 分布式运算:集群-Cluster

Akka-Cluster可以在一部物理机或一组网络连接的服务器上搭建部署.用Akka开发同一版本的分布式程序可以在任何硬件环境中运行,这样我们就可以确定以Akka分布式程序作为标准的编程方式了. 在上面两篇讨论里我们介绍了Akka-Remoting.Akka-Remoting其实是一种ActorSystem之间Actor对Actor点对点的沟通协议.通过Akka-Remoting来实现一个ActorSystem中的一个Actor与另一个Actorsystem中的另一个Actor之间的沟通.在Re