5.5 Follower and candidate crashes
Until this point we have focused on leader failures. Follower and candidate crashes are much simpler to handle than leader crashes, and they are both handled in the same way. If a follower or candidate crashes, then future equestVote and AppendEntries RPCs sent to it will fail. Raft handles these failures by retrying indefinitely; if the crashed server restarts, then the RPC will complete successfully. If a server crashes after completing an RPC but before responding, then it will receive the same RPC again after it restarts. Raft RPCs are idempotent, so this causes no harm. For example, if a follower receives an AppendEntries request that includes log entries already present in its log, it ignores those entries in the new request.
5.5 follower和candidate的崩溃
一直以来我们都在关注于leader挂了的时候。follower和candidate比起leader崩溃处理要简单的多,他们可以用相同的方式进行处理。如果一个follower或者candidate崩溃了,equestVote and AppendEntries RPC将会发送失败信息。Raft通过不断重启来处理这些问题;如果崩溃的服务器重启了那么RPC就会成功完成。如果一个服务器在RPC完成后但在响应之前崩溃,它会在重启后再一次收到相同的RPC。Raft的RPC是幂等的(一个幂等操作的特点是其任意多次执行所产生的影响均与一次执行的影响相同),所以这不会引起损失。例如,如果一个follower收到一个包含本身拥有的日志条目的AppendEntries 请求,它会在新的请求中忽略掉这些日志条目。
5.6 Timing and availability
One of our requirements for Raft is that safety must not depend on timing: the system must not produce incorrect results just because some event happens more quickly or slowly than expected. However, availability (the ability of the system to respond to clients in a timely manner) must inevitably depend on timing. For example, if message exchanges take longer than the typical time between server crashes, candidates will not stay up long enough to win an election; without a steady leader, Raft cannot make progress.
Leader election is the aspect of Raft where timing is most critical. Raft will be able to elect and maintain a steady leader as long as the system satisfies the following timing requirement:
broadcastTime ? electionTimeout ? MTBF
In this inequality broadcastTime is the average time it takes a server to send RPCs in parallel to every server in the cluster and receive their responses; electionTimeout is the election timeout described in Section 5.2; and MTBF is the average time between failures for a single server. The broadcast time should be an order of magnitude less than the election timeout so that leaders can reliably send the heartbeat messages required to keep followers from starting elections; given the randomized approach used for election timeouts, this inequality also makes split votes unlikely. The election timeout should be a few orders of magnitude less than MTBF so that the system makes steady progress. When the leader crashes, the system will be unavailable for roughly the election timeout; we would like this to represent only a small fraction of overall time.
The broadcast time and MTBF are properties of the underlying system, while the election timeout is something we must choose. Raft’s RPCs typically require the recipient to persist information to stable storage, so the broadcast time may range from 0.5ms to 20ms, depending on storage technology. As a result, the election timeout is likely to be somewhere between 10ms and 500ms. Typical server MTBFs are several months or more, which easily satisfies the timing requirement.
5.6 时序和可用性
Raft的必要条件之一就是安全性不能依赖于时序:系统不能因为一些事件发生的比预期的过快或过慢导致产生不正确的结果。然而,可用性(系统及时响应客户端的能力)必将依赖于时序。例如,消息交换的时间比服务器间崩溃的一般时间长,那么candidate就不能有足够的时间来当选;没有稳定的leader,Raft就无法取得进展。
Raft的leader选举,时序是比较重要的方面。Raft只要满足以下的时序要求,就能选举并保持一个稳定的leader:
broadcastTime ? electionTimeout ? MTBF
在这个不等式中,boroadcastTime是一个服务器并行发送RPC给集群中其他服务器并接收他们响应的平均时间;electionTimeout是如5.2节描述的选举超时时间;MTBF服务器间失败的平均时间。broadcast的时间必须大幅小于选举超时时间,使得leader可以有效地发送用于避免follower重新选举的心跳消息;鉴于选举超时采用随机策略,这个不等式也使得投票不大可能分散。选举超时需要小幅度的小于MTBF,以使得系统可以稳定运行。当leader崩溃时,系统将会因为选举超时不可用;我们认为这只代表了总时间的一小部分。
broadcast时间和MTBF是底层系统的属性,而选举超时是我们是我们必要的选择。Raft的RPC通常需要接收者持久化信息到稳定的存储中,所以broadcast时间可能在0.5ms-20ms之间,这取决于存储技术。结果,选举超时可能在10ms-500ms之间。一般服务器的MTBF都在几个月或更长的时间,这很容易满足时序要求。