一致性算法探寻(扩展版)7

5.5 Follower and candidate crashes

Until this point we have focused on leader failures. Follower and candidate crashes are much simpler to handle than leader crashes, and they are both handled in the same way. If a follower or candidate crashes, then future  equestVote and AppendEntries RPCs sent to it will fail. Raft handles these failures by retrying indefinitely; if the crashed server restarts, then the RPC will complete successfully. If a server crashes after completing an RPC but before responding, then it will receive the same RPC again after it restarts. Raft RPCs are idempotent, so this causes no harm. For example, if a follower receives an AppendEntries request that includes log entries already present in its log, it ignores those entries in the new request.

5.5 follower和candidate的崩溃

一直以来我们都在关注于leader挂了的时候。follower和candidate比起leader崩溃处理要简单的多,他们可以用相同的方式进行处理。如果一个follower或者candidate崩溃了,equestVote and AppendEntries RPC将会发送失败信息。Raft通过不断重启来处理这些问题;如果崩溃的服务器重启了那么RPC就会成功完成。如果一个服务器在RPC完成后但在响应之前崩溃,它会在重启后再一次收到相同的RPC。Raft的RPC是幂等的(一个幂等操作的特点是其任意多次执行所产生的影响均与一次执行的影响相同),所以这不会引起损失。例如,如果一个follower收到一个包含本身拥有的日志条目的AppendEntries 请求,它会在新的请求中忽略掉这些日志条目。

5.6 Timing and availability

One of our requirements for Raft is that safety must not depend on timing: the system must not produce incorrect results just because some event happens more quickly or slowly than expected. However, availability (the ability of the system to respond to clients in a timely manner) must inevitably depend on timing. For example, if message exchanges take longer than the typical time between server crashes, candidates will not stay up long enough to win an election; without a steady leader, Raft cannot make progress.

Leader election is the aspect of Raft where timing is most critical. Raft will be able to elect and maintain a steady leader as long as the system satisfies the following timing requirement:

broadcastTime ? electionTimeout ? MTBF

In this inequality broadcastTime is the average time it takes a server to send RPCs in parallel to every server in the cluster and receive their responses; electionTimeout is the election timeout described in Section 5.2; and MTBF is the average time between failures for a single server. The broadcast time should be an order of magnitude less than the election timeout so that leaders can reliably send the heartbeat messages required to keep followers from starting elections; given the randomized approach used for election timeouts, this inequality also makes split votes unlikely. The election timeout should be a few orders of magnitude less than MTBF so that the system makes steady progress. When the leader crashes, the system will be unavailable for roughly the election timeout; we would like this to represent only a small fraction of overall time.

The broadcast time and MTBF are properties of the underlying system, while the election timeout is something we must choose. Raft’s RPCs typically require the recipient to persist information to stable storage, so the broadcast time may range from 0.5ms to 20ms, depending on storage technology. As a result, the election timeout is likely to be somewhere between 10ms and 500ms. Typical  server MTBFs are several months or more, which easily satisfies the timing requirement.

5.6 时序和可用性

Raft的必要条件之一就是安全性不能依赖于时序:系统不能因为一些事件发生的比预期的过快或过慢导致产生不正确的结果。然而,可用性(系统及时响应客户端的能力)必将依赖于时序。例如,消息交换的时间比服务器间崩溃的一般时间长,那么candidate就不能有足够的时间来当选;没有稳定的leader,Raft就无法取得进展。

Raft的leader选举,时序是比较重要的方面。Raft只要满足以下的时序要求,就能选举并保持一个稳定的leader:

        broadcastTime electionTimeout MTBF

在这个不等式中,boroadcastTime是一个服务器并行发送RPC给集群中其他服务器并接收他们响应的平均时间;electionTimeout是如5.2节描述的选举超时时间;MTBF服务器间失败的平均时间。broadcast的时间必须大幅小于选举超时时间,使得leader可以有效地发送用于避免follower重新选举的心跳消息;鉴于选举超时采用随机策略,这个不等式也使得投票不大可能分散。选举超时需要小幅度的小于MTBF,以使得系统可以稳定运行。当leader崩溃时,系统将会因为选举超时不可用;我们认为这只代表了总时间的一小部分。

broadcast时间和MTBF是底层系统的属性,而选举超时是我们是我们必要的选择。Raft的RPC通常需要接收者持久化信息到稳定的存储中,所以broadcast时间可能在0.5ms-20ms之间,这取决于存储技术。结果,选举超时可能在10ms-500ms之间。一般服务器的MTBF都在几个月或更长的时间,这很容易满足时序要求。

时间: 2024-10-11 13:14:08

一致性算法探寻(扩展版)7的相关文章

一致性算法探寻(扩展版)图解

首先,翻一下图1的注释:复制状态机架构.一致性算法管理日志复制包括从可短接收的状态机命令.状态机处理日志里相同序列的命令,所以他们产生相同的输出. 正式图解,首先图1分为2个部分,客户端和服务器.箭头1由客户端指向服务器的一致性模块,表示由客户端发送请求至服务器由一致性模块接收,然后才有箭头2进行分发日志处理的命令.可以看到箭头2指向多层的log模块,表示多个服务器接收了该信息.日志处理完成后,出现箭头3,日志模块发送消息给状态机,最后由状态机返回结果给客户端.这里主要阐述状态机的实现. 再来由

一致性算法探寻(扩展版)3

5 The Raft consensus algorithm Raft is an algorithm for managing a replicated log of the form described in Section 2. Figure 2 summarizes the algorithm in condensed form for reference, and Figure 3 lists key properties of the algorithm; the elements

一致性算法探寻(扩展版)8

6 Cluster membership changes Up until now we have assumed that the cluster configuration (the set of servers participating in the consensus algorithm) is fixed. In practice, it will occasionally be necessary to change the configuration, for example t

一致性算法探寻(扩展版)5

5.3 Log replication Once a leader has been elected, it begins servicing client requests. Each client request contains a command to be executed by the replicated state machines. The leader appends the command to its log as a new entry, then issues App

一致性算法探寻(扩展版)11

9 Implementation and evaluation We have implemented Raft as part of a replicated state machine that stores configuration information for RAMCloud [33] and assists in failover of the RAMCloud coordinator. The Raft implementation contains roughly 2000

一致性算法探寻(扩展版)4

5.2 Leader election Raft uses a heartbeat mechanism to trigger leader election. When servers start up, they begin as followers. A server remains in follower state as long as it receives valid RPCs from a leader or candidate. Leaders send periodic hea

一致性算法探寻(扩展版)13

11 Conclusion Algorithms are often designed with correctness, efficiency, and/or conciseness as the primary goals. Although these are all worthy goals, we believe that understandability is just as important. None of the other goals can be achieved un

一致性算法探寻(扩展版)9

7 Log compaction Raft's log grows during normal operation to incorporate more client requests, but in a practical system, it cannot grow without bound. As the log grows longer, it occupies more space and takes more time to replay. This will eventuall

分布式一致性算法:Raft 算法

Raft 算法是可以用来替代 Paxos 算法的分布式一致性算法,而且 raft 算法比 Paxos 算法更易懂且更容易实现.本文对 raft 论文进行翻译,希望能有助于读者更方便地理解 raft 的思想.如果对 Paxos 算法感兴趣,可以看我的另一篇文章:分布式系列文章--Paxos算法原理与推导 摘要Raft 是用来管理复制日志(replicated log)的一致性协议.它跟 multi-Paxos 作用相同,效率也相当,但是它的组织结构跟 Paxos 不同.这使得 Raft 比 Pax