HybridTime - Accessible Global Consistency with High Clock Uncertainty

Amazon’s Dynamo [9] and Facebook’s Cassandra [13], relax the consistency model,and offer only eventual consistency.

Others such as HBase [1] and BigTable [4] offer strong consistency only for operations touching a single partition, but not across the database as a whole.

Dynamo和Cassandra都是基于Vector Clocks和Gossip算法,强调高可用性,可以达到eventual consistency

而HBase,可以提供单分区的强一致性,但对于跨分区,或跨数据库,就无法保证;

for example, in a geo-replicated service, a user may first be routed to a local datacenter and perform some action such as sending an email message.
If they then reload their browser and are routed to another datacenter backend, they would still expect to see their sent message in their email outbox.

这个例子比较典型,要达到异地的一致性

 

一般要解决分布式问题的一致性问题的最简单的办法就是用synchronized clocks across all servers,但这基本是达不到的

the traditional approach to global consistency has been to use logical clocks.
Logical clocks, such as Lamport Clocks [15] or Vector Clocks [10, 20].

所以传统的做法就是用逻辑时钟

https://www.zhihu.com/question/19994133

贴个回答,写的不错

当然可以,但是不适合在分布式数据库里面。为什么呢? 首先,需要先确定你这个时间戳取自哪里。1、所有的时间戳都取自一个中心点节点,这样让时间戳得到一个全序关系,这是最简单的实现,如果你的分布式数据库只是部署在一个数据中心的话,但是这样就增加了一个中心节点,影响数据库的扩展性,而且任何一个取时间戳的动作都要增加上网络延迟的开销。如果分布式数据库部署需要跨数据中心,这种方案变的不可行,跨数据中心网络时延太大。从而影响整体性能2、所有的时间戳取自已所在的节点。这就涉及到时钟同步的问题,就像Dongdong说的,如果取的物理时间那么这些节点并没有一个一致的时间,总会有一点误差,这种解决方案就有两种思路:
一、就是现在以Leslie Lamport 提出的逻辑时钟的方法,重新定义了一种分布式的全序关系。vector clock就是以逻辑时钟为基础上发展而来的。
二、使用物理时钟,就是google spanner使用gps 加原子钟进行时间同步,不同节点之间的时间是不能精确同步的,只能把不同节点之间的时间同步到一个范围之内,spanner一个节点上获得一个时间戳,不是一个时间值,而是一个时间值加上一个误差范围,也就是一个时间窗,如果两个时间窗有重合,就无法比较大小,从而无法比较出来时间窗代表的事件的先后关系。而时间同步算法就是让分布式各节点之间的时间的误差尽量的小,这样取时间戳这个动作效率是最高的,只在本节点取一下物理时间即可,但是为了同步时间,各个节点肯定在周期与其它节点进行通讯来同步物理时间(google时间同步的算法好像没有开源)。时间同步算法只能让时间尽量的精确。分布式数据库里面是不能直接使用NTP来同步时间的,因为NTP时间同步协议里面允许节点的时间能够回退,而数据库里面要求本地的时钟必须是递增的。
三、还有一种混合逻辑时钟(Logical Physical Clocks),也是由逻辑时钟演变而来,但是时钟的值比较接近于物理时钟,而且不依赖于下面物理时钟的同步的算法,它的时间戳都可以在本地取,而且取出来是一个时间值,而不像spanner一样,取出来是一个时间窗。详细可以参见Logical Physical Clocks and Consistent Snapshots in Globally Distributed Databases这篇论文

关于Lamport或向量时钟,

Lamport时钟就是逻辑时钟,即不依赖于物理时钟,而依赖event之间的因果关系来定义的一种偏序

参考,全序, 分布式一致性的本质

而Vector时钟是,逻辑时钟的一种实现,具体的逻辑看下文,

Why Vector Clock are Easy or Hard?

摘自上面的知乎回答

“dynamo paper是七、八年前写的了,现在的amazon dynamo早已摒弃了version vector,而采用了synchronous replication(类似paxos的protocol),每一个partition会有一个leader负责写入。其实version vector并不scale,因为对于一个key来说,随着writer数量的增加,version vector数量成指数级的增长”

逻辑时钟的缺点,

first, they require that all clients must propagate clock data to achieve consistent views,
and second, the assigned timestamps have no relation to physical time.

 

Spanner introduced commit-wait, a way of ensuring physical-time based consistent global state by forcing operations to wait long enough so that all participants agree that the operation’s timestamp has passed based on worst case synchronization error bounds.

While innovative, the system performance becomes highly dependent on the quality of the time synchronization infrastructure, and thus may have unacceptable performance absent specialized hardware such as atomic clocks and GPS receivers.

Spanner利用原子钟和GPS接收器,实现了一个较为精确的时钟,这个时钟叫做TrueTime,每次调用TrueTime API返回的是一个时间区间,而不是一个具体的值,这个TrueTime保证的是真实时间(absolute time/real time)一定在这个区间内,这个区间范围通常大约14ms,甚至更小。

所以只有每个transaction的时间区间,都不重合,那么就可以比较容易的做到全局有序;

所以这里需要commit-wait,

可以看到,我在开始transaction的时候,取一次true time,[t1.earliest, t1.latest]

那么什么时候,我可以把这个transaction commit掉,即释放掉锁,即别人可以开始新的transaction

当我再取一次true time,[t2.earliest, t2.latest],当t2.earliest > t1.latest的时候,即可

因为这样两个transaction的时间区间就不可能重叠了,当然代价就是每个transaction都有等待大概2个error bound的时间

当然,spanner的实现依赖硬件的infra设备,普通用户无法复制;

 

HybridTime(HT), a hybrid between physical and logical clocks and we show how HT can be used to achieve the same semantics as Spanner, but with good performance even with commonly available time synchronization.

Like Spanner, our approach relies on physical time measurements with bounded error to assign HybridTime timestamps to events that occur in the system.
However, unlike Spanner, our approach does not usually require the error to be waited out, thus allowing for usage in common deployment scenarios where clocks are synchronized through common protocols such as the Network Time Protocol, in which clock synchronization error is often higher than with Spanner’s TrueTime.

The trade-off is that, in order to avoid commit-wait, HybridTime requires that timestamps be propagated across machines to achieve the same consistency semantics as Spanner.
Contrary to vector clocks, which can expand as the number of participants in the cluster grows, HybridTime timestamps have constant and small size.

HybridTime说是physical and logical clocks的结合,和spanner一样,也可以使用带error bound的物理时间
但由于他没有spanner的硬件设施,所以做不到低延迟的时间同步,所以他依然使用逻辑时钟来达到一致性;并且他解决了vector clocks随着client的数量增加而会size膨胀的问题

HybridTime clocks follow similar update rules to Lamport clocks, but the time values are not purely logical:
each time value has both a logical component, which helps in guaranteeing the same properties as a Lamport Clocks, and a physical component which allows the event to be associated with a physical point-in-time.

 

One of Spanner’s key benefits is that is externally consistent, which is defined as fully linearizable, even in the presence of hidden channels.

Additionally we use the term globally consistent to describe a system which provides the same linearizability semantics, provided that there are no hidden channels present.

区分为,是否有hidden channels?意思是clients间存在因果或先后关系,比如通过发message,都是这个因果关系对于系统是不可知的,比如A write,然后发送消息给B,然后B write,那么因果关系上,B write一定后于A write,但是对于数据库而言他并不知道,所以并不能保证A write先写入;

spanner是可以保证externally consistent, 由于他通过 commit wait,来保证每个transaction之间一定是全局有序的

相对的globally consistent,更简单些,因为并没有hidden channels

 

Spanner’s key innovation is that timestamps assigned by the system can be used to achieve external consistency, but also have physical meaning.

HybridTime is always globally consistent, and through selective application of commit-wait is externally consistent.

Spanner的主要创新在于实现external一致性的,同时还保留了物理时间的

而HybridTime,默认情况下是可以实现globally consistent,即偏序,因为他本身就是使用lamport时钟,并且当你选择commit-wait时,也可以保证externally consistent;

 

HybridTime Assumptions

1. HybridTime assumes that machines have a reasonably accurate physical clock, represented by the PCi(e) function, which outputs the numeric timestamp returned by the physical clock as read by process i for event e, that is able to provide absolute time measurements (usually in milli- or microseconds since 1 January 1970).

2. keeps the physical clocks across different servers synchronized with regard to a reference server, the “reference” time, represented by the PCref(e) function which outputs the numeric timestamp returned by the “reference” process for event e;

Additionally, we assume that such a substrate is able to provide an error bound along with each time measurement, denoted by the Ei(e) function, which outputs the numeric value ε error of process i at the time e occurred

3. we make no assumptions on the actual accuracy of the clocks, i.e. the physical timestamps returned by server’s clocks may have an arbitrarily large but finite error, as long as this error’s bound is known

说白了,假设

有个相对靠谱的物理时钟,一个理想的参照时钟,以及他们之间相差的,error bound

最后,我们并不假设这个error bound会很小,只要有限就可以

假设1,我们有有限的error bound

 

假设2, 进程级别的物理时钟单调递增,注意是进程级别

 

基于以上假设,提出HybridTime Clock and Protocol

 

HybridTime Clock and Protocol

HybridTime clock(HTC) is a pair (physical,logical) where the first component is a representation of the physical time at which the event occurred and the second component is a logical sequence number.

定义其实显而易见。。。

Algorithm 2 depicts the HTC algorithm.

HTC算法如上,两部分,NOW和UPDATE

now就是取当前的HybridTime clock

upate就是根据in 来更新当前的clock

Algorithm 2 implements a Lamport Clock, with the additional advantage that generated timestamps have physical meaning and are accurate representations of physical time within a bound error.

算法本身,其实就是Lamport Clock,只是增加了physical clock的部分

给出一个例子,

 

To order the events timestamped using the HybridTime Clock algorithm we use Definition 1.

Definition 1. HCT(e) < HCT( f ) is defined as the lexicographical ordering of the timestamp two-tuple (physical,logical)

Theorem 1. The HybriTime clock happened-before relation forms a total order of events

Theorem 2. For any event in a causal chain f , the physical component of a HTC timestamp approximates the “real” time the event occurred, with a error defined and bounded by

 

Implementation

No Consistency - In this mode there are no external consistency guarantees, transactions are assigned timestamps from each server’s physical clock and no guarantee is made that reads are consistent or repeatable.

直接用local的物理时间,不保证一致性

HybridTime Consistency - In this mode our implementation guarantees the global consistency as Spanner, absent hidden channels, but using HybridTime instead of commit-wait.
Clients choosing this consistency mode on writes must make sure that the timestamp that is received from the server is propagated to other servers and/or clients.
Within the same client process, timestamps are automatically propagated on behalf of the user.

这个其实和逻辑时钟,没有区别,就是保证偏序

Commit-wait Consistency - In this mode our implementation guarantees the same external consistency semantics as Spanner by also using commit-wait in the way described in the original paper.
However instead of using TrueTime, which is a proprietary and private API, we implemented commit-wait on top of the widely used Network Time Protocol (NTP). Hence, in this consistency mode we support hidden channels.

这个估计没啥用,在没有spanner硬件保证的情况下,commit-wait,性能不能忍吧

时间: 2024-10-06 01:14:48

HybridTime - Accessible Global Consistency with High Clock Uncertainty的相关文章

clock constrain

Asynchronous Clocks :两个时钟不知道他们之间的相位关系 In most cases, primary clocks can be treated as asynchronous. When associated with   their respective generated clocks, they form asynchronous clock groups. Synchronous Clocks : This is usually the case when thei

Clock Skew , Clock uncertainly 和 Period

待修正 1.Clock Skew Clock Skew = clock path delay to the destination synchronous element - clock path delay to te source synchronous element. 注意,clock skew只提到了path delay,但是实际上对于destination synchronous element 和source synchronous element而言,时钟的相位可能是不一样的.这

quartus II 全编译出现严重警告

Critical Warning: The following clock transfers have no clock uncertainty assignment. For more accurate results, apply clock uncertainty assignments or use the derive_clock_uncertainty command.Critical Warning: From Clock (Rise) to Clock (Rise) (setu

CTS的前世今生

Clock Tree Synthesis,时钟树综合,简称CTS.时钟树综合就是建立一个时钟网络,使时钟信号能够传递到各个时序器件.CTS是布局之后相当重要的一个步骤,在现如今集成了上亿个晶体管的芯片上,如何设计一个合理的时钟网络,是一件非常具有挑战性的事情.个人认为相比于place和route更依赖工具的能力,CTS是需要更多的人为干预.通常需要人工做的事情,那都是比较难弄的.在深入学习如何做好时钟树之前,以下这些概念和问题我们首先得先弄明白:What's the purpose of CTS

Timequest Timing Analyzer进行时序分析(三)

上面已经把DAC7512控制器中所有的时钟都创建好了.下面我们再额外讨论一下关于时钟属性方面的一些问题和在做时序分析时的处理方法.对于具有单一时钟的系统,设计和时序分析都相对简单.但是现在很多设计都有多个甚至几十个时钟乃至更多的时钟.比如说DAC7512控制器,在设计中用到的时钟实际上是有3个,CLK25M,CLK50M和DA_SCLK.在对多时钟设计进行时序分析的时候,我们首先要搞清楚各时钟之间的关系. 当设计中有多个时钟时,时钟之间可能存在三种关系,分别是同步,异步和互斥.如果两个或者多个时

半监督学习[转]

0  引言 机器学习(machine learning)是人工智能的核心研究领域,是智能信息处理的重要途径.监督学习(supervised learning)是机器学习中研究最多.应用最广泛的一种学习途径.在传统的监督学习中,学习系统通过对大量的有标记训练样本 (labeled examples) 进行学习,建立模型用于预测未知样本的标记(label).在此,标记对应于示例的输出,用于表征待学习的目标概念. 随着数据收集和存储技术的发展,收集大量缺少标记的数据已相当容易,而为这些数据提供标记则相

Cisco IOS basic system management command reference

absolute : to specify an absolute time for a time-range (in time-range configuration mode) no absolute buffer-length : to specify the maximum length of the data stream to be forwarded (in line configuration mode) no buffer-length buffers : to make ad

Timequest GUI

Tasks界面 使用Tasks界面可以访问常用命令,例如生成网表建立报告等. 两个常用命令位于Tasks界面中:打开工程和编写SDC文件.其他命令在下面的文件夹中: Netlist Setup Reports Macros 注释 : Tasks 界面中的每一命令都有等价的工具命令语言 (Tcl) 命令,命令运行时,显示在 Console 界面中. 打开工程,编写SDC文件 要打开 TimeQuest 分析器中的工程,双击 Open Project 任务.如果您从 Quartus II 软件 GU

流水线设计

20世纪80年代,流水线技术成为RISC处理器设计方法中最基本的技术之一,RISC的设计多以高流水为目标设计. 而后流水线技术也被应用到CISC处理器 在流水线的发展史上主要有两种流水线,算术流水线和指令流水线. 首先通过算术流水线的例子,介绍流水线理想假设. 流水线意味着将系统分割为许多段,段与段之间增加合适的缓冲,这样每隔D/K个时间单位就可以启动新的任务,而不是每隔D个时间单位. 流水线并不能被无限制的增加下去,因为时钟的限制(setup time/clock uncertainty),流