why is agreement hard in a distributed system?

same question as:

why is PAXOS necessary?

1, what if >1 nodes become leaders simultaneously?

that‘s why we need phase#1 (prepare) to select a leader.

2, what if there is a netwrok partition?

that‘s why we need a majority to make some nodes progress. if less than majority, just fails.

3, what if a leader crashes in the middle of solicitation?

4, what if a leader crashes after deciding but before announcing results?

5, what if the new leader proposes different values than already decided value?

that‘s why we need ballot number (made up of a unique increasing number and the processor id) so that any processor is likely to be a new leader.

时间: 2024-08-24 05:53:52

why is agreement hard in a distributed system?的相关文章

分布式系统(Distributed System)资料

这个资料关于分布式系统资料,作者写的太好了.拿过来以备用 网址:https://github.com/ty4z2008/Qix/blob/master/ds.md 希望转载的朋友,你可以不用联系我.但是一定要保留原文链接,因为这个项目还在继续也在不定期更新.希望看到文章的朋友能够学到更多. <Reconfigurable Distributed Storage for Dynamic Networks> 介绍:这是一篇介绍在动态网络里面实现分布式系统重构的paper.论文的作者(导师)是MIT

软件优才夏令营A decentralized approach for mining event correlations in distributed system monitoring译文(原创)

用一种分布式处理方法 挖掘分布式系统检测到的事件联系 点击下载演示文档 abstract:现在,对监控.分析和控制大规模分布式系统的需求越来越高涨.监控下的事件往往呈现出相关联的关系,这对资源分配.工作调度还有故障预测有很大帮助.为了发现在检测到的事件中的联系,很多已有的方法是把被检测事件放到数据库中并对其进行数据挖掘.但是我们认为这些方法并不适合大规模分布式系统,因为监控事件的数据量增长得非常快以至于很难用一台计算机的力量来进行事件之间联系的发现.在本文中,我们提出了一种分布式的方法有效地检测

distributed system (from 陈皓 http://coolshell.cn)

当我们在生产线上用一台服务器来提供数据服务的时候,我会遇到如下的两个问题: 1)一台服务器的性能不足以提供足够的能力服务于所有的网络请求. 2)我们总是害怕我们的这台服务器停机,造成服务不可用或是数据丢失. 于是我们不得不对我们的服务器进行扩展,加入更多的机器来分担性能上的问题,以及来解决单点故障问题. 通常,我们会通过两种手段来扩展我们的数据服务: 1)数据分区:就是把数据分块放在不同的服务器上(如:uid % 16,一致性哈希等). 2)数据镜像:让所有的服务器都有相同的数据,提供相当的服务

Parallelized coherent read and writeback transaction processing system for use in a packet switched cache coherent multiprocessor system

A multiprocessor computer system is provided having a multiplicity of sub-systems and a main memory coupled to a system controller. An interconnect module, interconnects the main memory and sub-systems in accordance with interconnect control signals

(转)分布式深度学习系统构建 简介 Distributed Deep Learning

HOME ABOUT CONTACT SUBSCRIBE VIA RSS DEEP LEARNING FOR ENTERPRISE Distributed Deep Learning, Part 1: An Introduction to Distributed Training of Neural Networks Oct 3, 2016 3:00:00 AM / by Alex Black and Vyacheslav Kokorin Tweet inShare27   This post

可扩展的Web系统和分布式系统(Scalable Web Architecture and Distributed Systems)

Open source software has become a fundamental building block for some of the biggest websites. And as those websites have grown, best practices and guiding principles around their architectures have emerged. This chapter seeks to cover some of the ke

Apache Kafka: Next Generation Distributed Messaging System---reference

Introduction Apache Kafka is a distributed publish-subscribe messaging system. It was originally developed at LinkedIn Corporation and later on became a part of Apache project. Kafka is a fast, scalable, distributed in nature by its design, partition

Two-Phase-Commit for Distributed In-Memory Caches--reference

Part I reference from:http://gridgain.blogspot.kr/2014/09/two-phase-commit-for-distributed-in.html 2-Phase-Commit is probably one of the oldest consensus protocols and is known for its deficiencies when it comes to handling failures, as it may indefi

distributed computing_the World Wide Web

RESTful Web APIs_2013 I'm going to show you a better way to do distributed computing, using the ideas underlying the most successful distributed system in history: the World Wide Web.