(转载)Zab vs. Paxos

原创链接:https://cwiki.apache.org/confluence/display/ZOOKEEPER/Zab+vs.+Paxos

Is Zab just a special implementation of Paxos?

No, Zab is a different protocol than Paxos, although it shares with it some key aspects, as for example:

  • A leader proposes values to the followers
  • Leaders wait for acknowledgements from a quorum of followers before considering a proposal committed (learned)
  • Proposals include epoch numbers, which are similar to ballot numbers in Paxos

The main conceptual difference between Zab and Paxos is that it is primarily designed for primary-backup systems, like Zookeeper, rather than for state machine replication.

What is the difference between primary-backup and state machine replication?

A state machine is a software component that processes a sequence of requests. For every processed request, it can modify its internal state and produce a reply. A state machine is deterministic in the sense that, given two runs where it receives the same sequence of requests, it always makes the same internal state transitions and produces the same replies.

A state machine replication system is a client-sever system ensuring that each state machine replica executes the same sequence of client requests, even if these requests are submitted concurrently by clients and received in different orders by the replicas. Replicas agree on the execution order of client requests using a consensus algorithm like Paxos. Client requests that are sent concurrently and overlap in time can be executed in any order. If a leader fails, a new leader that executes recovery is free to arbitrarily reorder any uncommitted request since it is not yet completed.

In the case of primary-backup systems, such as Zookeeper, replicas agree on the application order of incremental (delta) state updates, which are generated by a primary replica and sent to its followers. Unlike client requests, state updates must be applied in the exact original generation order of the primary, starting from the original initial state of the primary. If a primary fails, a new primary that executes recovery cannot arbitrarily reorder uncommitted state updates, or apply them starting from a different initial state.

In conclusion, agreement on state updates (for primary-backup systems) requires stricter ordering guarantees than agreement on client requests (for state machine replication systems).

What are the implications for agreement algorithms?

Paxos can be used for primary-backup replication by letting the primary be the leader. The problem with Paxos is that, if a primary concurrently proposes multiple state updates and fails, the new primary may apply uncommitted updates in an incorrect order. An example is presented in our DSN 2011 paper(Figure 1). In the example, a replica should only apply the state update B after applying A. The example shows that, using Paxos, a new primary and its follows may apply B after C, reaching an incorrect state that has not been reached by any of the previous primaries.

A workaround to this problem using Paxos is to sequentially agree on state updates: a primary proposes a state update only after it commits all previous state updates. Since there is at most one uncommitted update at a time, a new primary cannot incorrectly reorder updates. This approach, however, results in poor performance.

Zab does not need this workaround. Zab replicas can concurrently agree on the order of multiple state updates without harming correctness. This is achieved by adding one more synchronization phase during recovery compared to Paxos, and by using a different numbering of instances based on zxids.

Want to know more?

Have a look at our DSN 2011 paper, or contact us!

时间: 2024-12-10 01:14:44

(转载)Zab vs. Paxos的相关文章

paxos(chubby) vs zab(Zookeeper)

参考: Zookeeper的一致性协议:Zab Chubby&Zookeeper原理及在分布式环境中的应用 Paxos vs. Viewstamped Replication vs. Zab Zab vs. Paxos Zab: High-performance broadcast for primary-backup systems Chubby:面向松散耦合的分布式系统的锁服务 Chubby 和Zookeeper 的理解 zookeeper 的选主过程使用paxos,数据复制使用Zab(zo

<从PAXOS到ZOOKEEPER分布式一致性原理与实践>读书笔记-ZAB协议

本文属于分布式系统学习笔记系列,上一篇笔记整理了paxos算法,本文属于原书第四章,梳理zookeeper的目标特性及ZAB协议. 1.介绍zookeeper 1.1ZooKeeper保证一致性特性 ZooKeeper是一个典型的分布式数据一致性的解决方案,分布式程序可以基于它实现诸如数据发布/订阅.负载均衡.命名服务.分布式协调通知.集群管理.master选举.分布式锁.分布式队列等功能.ZooKeeper可以保证如下分布式一致性特性. 1.顺序一致性: 从同一个客户端发起的事务请求,最终将严

ZAB协议与Paxos算法

ZooKeeper并没有直接采用Paxos算法,而是采用一种被称为ZAB(ZooKeeper Atomic Broadcast)的一致性协议 ZooKeeper是一个典型的分布式数据一致性的解决方案,分布式应用程序可以基于它实现诸如数据发布/订阅.负载均衡.命名服务.分布式协调/通知.集群管理.Master选举.分布式锁和分布式队列等功能 ZooKeeper致力于提供一个高性能.高可用,具有严格的顺序访问控制能力(主要是写操作的严格顺序性)的分布式协调服务 ZooKeeper会将全量数据保存在内

Zookeeper的一致性协议:Zab(转)

Zookeeper使用了一种称为Zab(Zookeeper Atomic Broadcast)的协议作为其一致性复制的核心,据其作者说这是一种新发算法,其特点是充分考虑了Yahoo的具体情况:高吞吐量.低延迟.健壮.简单,但不过分要求其扩展性.下面将展示一些该协议的核心内容: 另,本文仅讨论Zookeeper使用的一致性协议而非讨论其源码实现 Zookeeper的实现是有Client.Server构成,Server端提供了一个一致性复制.存储服务,Client端会提供一些具体的语义,比如分布式锁

《从PAXOS到ZOOKEEPER分布式一致性原理与实践》pdf

下载地址:网盘下载 内容简介  · · · · · · <Paxos到Zookeeper:分布式一致性原理与实践>从分布式一致性的理论出发,向读者简要介绍几种典型的分布式一致性协议,以及解决分布式一致性问题的思路,其中重点讲解了Paxos和ZAB协议.同时,本书深入介绍了分布式一致性问题的工业解决方案--ZooKeeper,并着重向读者展示这一分布式协调框架的使用方法.内部实现及运维技巧,旨在帮助读者全面了解ZooKeeper,并更好地使用和运维ZooKeeper.全书共8章,分为五部分:第一

分布式系统理论进阶 - Raft、Zab

引言 <分布式系统理论进阶 - Paxos>介绍了一致性协议Paxos,今天我们来学习另外两个常见的一致性协议——Raft和Zab.通过与Paxos对比,了解Raft和Zab的核心思想.加深对一致性协议的认识. Raft Paxos偏向于理论.对如何应用到工程实践提及较少.理解的难度加上现实的骨感,在生产环境中基于Paxos实现一个正确的分布式系统非常难[1]: There are significant gaps between the description of the Paxos al

2PC,3PC,Paxos,Raft,ISR

本文主要讲述2PC及3PC,以及Paxos以及Raft协议. 两类一致性(操作原子性与副本一致性) 2PC协议用于保证属于多个数据分片上的操作的原子性.这些数据分片可能分布在不同的服务器上,2PC协议保证多台服务器上的操作要么全部成功,要么全部失败. Paxos协议用于保证同一个数据分片的多个副本之间的数据一致性.当这些副本分布到不同的数据中心时,这个需求尤其强烈. 一.2PC(阻塞.数据不一致问题.单点问题) Two-Phase Commit,两阶段提交 1.阶段一:提交事务请求(投票阶段)

ZAB协议(转)

转自:http://www.cnblogs.com/sunddenly/articles/4073157.html Zab协议 一.ZooKeeper概述 ZooKeeper内部有一个in-memory DB,表示为一个树形结构.每个树节点称为Znode(代码在DataTree.java和DataNode.java中). 客户端可以连接到zookeeper集群中的任意一台. 对于读请求,直接返回本地znode数据.写操作则转换为一个事务,并转发到集群的Leader处理.Zookeeper提交事务

Hadoop学习笔记(三)——zookeeper的一致性协议:ZAB

ZAB:ZooKeeper的Atomic Broadcast协议,能够保证发给各副本的消息顺序相同. Zookeeper使用了一种称为Zab(ZookeeperAtomic Broadcast)的协议作为其一致性复制的核心,其特点为高吞吐量.低延迟.健壮.简单,但不过分要求其扩展性. Zookeeper的实现是有Client.Server构成,Server端提供了一个一致性复制.存储服务,Client端会提供一些具体的语义,比如分布式锁.选举算法.分布式互斥等.从存储内容来说,Server端更多