<译>Zookeeper官方文档

apache原文地址:http://zookeeper.apache.org/doc/trunk/zookeeperOver.html

ZooKeeper

ZooKeeper: A Distributed Coordination Service for Distributed Applications

ZooKeeper is a distributed, open-source coordination service for distributed applications. It exposes a simple set of primitives that distributed applications can build upon to implement higher level services for synchronization, configuration maintenance, and groups and naming. It is designed to be easy to program to, and uses a data model styled after the familiar directory tree structure of file systems. It runs in Java and has bindings for both Java and C.

Coordination services are notoriously hard to get right. They are especially prone to errors such as race conditions and deadlock. The motivation behind ZooKeeper is to relieve distributed applications the responsibility of implementing coordination services from scratch.

Design Goals

ZooKeeper is simple. ZooKeeper allows distributed processes to coordinate with each other through a shared hierarchal namespace which is organized similarly to a standard file system. The name space consists of data registers - called znodes, in ZooKeeper parlance - and these are similar to files and directories. Unlike a typical file system, which is designed for storage, ZooKeeper data is kept in-memory, which means ZooKeeper can achieve high throughput and low latency numbers.

The ZooKeeper implementation puts a premium on high performance, highly available, strictly ordered access. The performance aspects of ZooKeeper means it can be used in large, distributed systems. The reliability aspects keep it from being a single point of failure. The strict ordering means that sophisticated synchronization primitives can be implemented at the client.

ZooKeeper is replicated. Like the distributed processes it coordinates, ZooKeeper itself is intended to be replicated over a sets of hosts called an ensemble.

ZooKeeper Service

The servers that make up the ZooKeeper service must all know about each other. They maintain an in-memory image of state, along with a transaction logs and snapshots in a persistent store. As long as a majority of the servers are available, the ZooKeeper service will be available.

Clients connect to a single ZooKeeper server. The client maintains a TCP connection through which it sends requests, gets responses, gets watch events, and sends heart beats. If the TCP connection to the server breaks, the client will connect to a different server.

ZooKeeper is ordered. ZooKeeper stamps each update with a number that reflects the order of all ZooKeeper transactions. Subsequent operations can use the order to implement higher-level abstractions, such as synchronization primitives.

ZooKeeper is fast. It is especially fast in "read-dominant" workloads. ZooKeeper applications run on thousands of machines, and it performs best where reads are more common than writes, at ratios of around 10:1.

Data model and the hierarchical namespace

The name space provided by ZooKeeper is much like that of a standard file system. A name is a sequence of path elements separated by a slash (/). Every node in ZooKeeper‘s name space is identified by a path.

ZooKeeper‘s Hierarchical Namespace

Nodes and ephemeral nodes

Unlike is standard file systems, each node in a ZooKeeper namespace can have data associated with it as well as children. It is like having a file-system that allows a file to also be a directory. (ZooKeeper was designed to store coordination data: status information, configuration, location information, etc., so the data stored at each node is usually small, in the byte to kilobyte range.) We use the term znode to make it clear that we are talking about ZooKeeper data nodes.

Znodes maintain a stat structure that includes version numbers for data changes, ACL changes, and timestamps, to allow cache validations and coordinated updates. Each time a znode‘s data changes, the version number increases. For instance, whenever a client retrieves data it also receives the version of the data.

The data stored at each znode in a namespace is read and written atomically. Reads get all the data bytes associated with a znode and a write replaces all the data. Each node has an Access Control List (ACL) that restricts who can do what.

ZooKeeper also has the notion of ephemeral nodes. These znodes exists as long as the session that created the znode is active. When the session ends the znode is deleted. Ephemeral nodes are useful when you want to implement [tbd].

Conditional updates and watches

ZooKeeper supports the concept of watches. Clients can set a watch on a znodes. A watch will be triggered and removed when the znode changes. When a watch is triggered the client receives a packet saying that the znode has changed. And if the connection between the client and one of the Zoo Keeper servers is broken, the client will receive a local notification. These can be used to [tbd].

Guarantees

ZooKeeper is very fast and very simple. Since its goal, though, is to be a basis for the construction of more complicated services, such as synchronization, it provides a set of guarantees. These are:

  • Sequential Consistency - Updates from a client will be applied in the order that they were sent.
  • Atomicity - Updates either succeed or fail. No partial results.
  • Single System Image - A client will see the same view of the service regardless of the server that it connects to.
  • Reliability - Once an update has been applied, it will persist from that time forward until a client overwrites the update.
  • Timeliness - The clients view of the system is guaranteed to be up-to-date within a certain time bound.

For more information on these, and how they can be used, see [tbd]

Simple API

One of the design goals of ZooKeeper is provide a very simple programming interface. As a result, it supports only these operations:

create

creates a node at a location in the tree

delete

deletes a node

exists

tests if a node exists at a location

get data

reads the data from a node

set data

writes data to a node

get children

retrieves a list of children of a node

sync

waits for data to be propagated

For a more in-depth discussion on these, and how they can be used to implement higher level operations, please refer to [tbd]

Implementation

ZooKeeper Components shows the high-level components of the ZooKeeper service. With the exception of the request processor, each of the servers that make up the ZooKeeper service replicates its own copy of each of components.

ZooKeeper Components

The replicated database is an in-memory database containing the entire data tree. Updates are logged to disk for recoverability, and writes are serialized to disk before they are applied to the in-memory database.

Every ZooKeeper server services clients. Clients connect to exactly one server to submit irequests. Read requests are serviced from the local replica of each server database. Requests that change the state of the service, write requests, are processed by an agreement protocol.

As part of the agreement protocol all write requests from clients are forwarded to a single server, called the leader. The rest of the ZooKeeper servers, called followers, receive message proposals from the leader and agree upon message delivery. The messaging layer takes care of replacing leaders on failures and syncing followers with leaders.

ZooKeeper uses a custom atomic messaging protocol. Since the messaging layer is atomic, ZooKeeper can guarantee that the local replicas never diverge. When the leader receives a write request, it calculates what the state of the system is when the write is to be applied and transforms this into a transaction that captures this new state.

Uses

The programming interface to ZooKeeper is deliberately simple. With it, however, you can implement higher order operations, such as synchronizations primitives, group membership, ownership, etc. Some distributed applications have used it to: [tbd: add uses from white paper and video presentation.] For more information, see [tbd]

Performance

ZooKeeper is designed to be highly performant. But is it? The results of the ZooKeeper‘s development team at Yahoo! Research indicate that it is. (See ZooKeeper Throughput as the Read-Write Ratio Varies.) It is especially high performance in applications where reads outnumber writes, since writes involve synchronizing the state of all servers. (Reads outnumbering writes is typically the case for a coordination service.)

ZooKeeper Throughput as the Read-Write Ratio Varies

The figure ZooKeeper Throughput as the Read-Write Ratio Varies is a throughput graph of ZooKeeper release 3.2 running on servers with dual 2Ghz Xeon and two SATA 15K RPM drives. One drive was used as a dedicated ZooKeeper log device. The snapshots were written to the OS drive. Write requests were 1K writes and the reads were 1K reads. "Servers" indicate the size of the ZooKeeper ensemble, the number of servers that make up the service. Approximately 30 other servers were used to simulate the clients. The ZooKeeper ensemble was configured such that leaders do not allow connections from clients.

Note

In version 3.2 r/w performance improved by ~2x compared to the previous 3.1 release.

Benchmarks also indicate that it is reliable, too. Reliability in the Presence of Errors shows how a deployment responds to various failures. The events marked in the figure are the following:

  1. Failure and recovery of a follower
  2. Failure and recovery of a different follower
  3. Failure of the leader
  4. Failure and recovery of two followers
  5. Failure of another leader

Reliability

To show the behavior of the system over time as failures are injected we ran a ZooKeeper service made up of 7 machines. We ran the same saturation benchmark as before, but this time we kept the write percentage at a constant 30%, which is a conservative ratio of our expected workloads.

Reliability in the Presence of Errors

The are a few important observations from this graph. First, if followers fail and recover quickly, then ZooKeeper is able to sustain a high throughput despite the failure. But maybe more importantly, the leader election algorithm allows for the system to recover fast enough to prevent throughput from dropping substantially. In our observations, ZooKeeper takes less than 200ms to elect a new leader. Third, as followers recover, ZooKeeper is able to raise throughput again once they start processing requests.

The ZooKeeper Project

ZooKeeper has been successfully usedin many industrial applications. It is used at Yahoo! as the coordination and failure recovery service for Yahoo! Message Broker, which is a highly scalable publish-subscribe system managing thousands of topics for replication and data delivery. It is used by the Fetching Service for Yahoo! crawler, where it also manages failure recovery. A number of Yahoo! advertising systems also use ZooKeeper to implement reliable services.

All users and developers are encouraged to join the community and contribute their expertise. See the Zookeeper Project on Apachefor more information.

时间: 2024-12-29 01:03:58

<译>Zookeeper官方文档的相关文章

[译]Polymer官方文档-布局元素

原文链接: Layout elements(注:有时需翻墙或换hosts)翻译日期: 2014年8月2日翻译人员: 铁锚 译注: 点击下载本教程的示例代码: Polymer布局元素Demo相关的代码文件在 layout 目录下. 0. 准备- Chrome浏览器模拟移动设备Chrome浏览器(要求最新版,如34以上版本)模拟移动设备的方法: 打开一个标签页,按 F12,(或者在页面空白区域点鼠标右键,审查元素),打开调试窗口. 然后在将光标移动到调试窗口中, 按 ESC键,或者点击右上角的 dr

ZooKeeper官方文档资源

一般来说官方的文档是最权威的. 入口:http://zookeeper.apache.org/ 在右侧即可进入相应版本文档: 如果想要看主干的文章,入口如下,主干是最稳当的版本:http://zookeeper.apache.org/doc/trunk/

zookeeper 官方文档——综述

Zookeeper: 一个分布式应用的分布式协调服务 zookeeper 是一个分布式的,开源的协调服务框架,服务于分布式应用程序. 它暴露了一系列基础操作服务,因此,分布式应用能够基于这些服务构建出更高级别的服务,比如,同步,配置管理,命名和分组服务. zookeeper设计上易于编码,数据模型构建在我们熟悉的树形结构目录风格的文件系统中. zookeeper运行在java中,同时支持java和C 语言.正确的实现协调服务是公认的难干的差事. 他们及其容易出错,比如资源竞争和死锁. zooke

Zookeeper官方文档

ZooKeeper Getting Started Guide Getting Started: Coordinating Distributed Applications with      ZooKeeper Pre-requisites Download Standalone Operation Managing ZooKeeper Storage Connecting to ZooKeeper Programming to ZooKeeper Running Replicated Zoo

Unity性能优化(2)-官方文档简译

本文是Unity官方教程,性能优化系列的第二篇<Diagnosing performance problems using the Profiler window>的简单翻译. 简介 如果游戏运行缓慢,卡顿,我们知道游戏存在性能问题.在我们尝试解决问题前,需要先知道引起问题的原因.不同问题需要不同的解决方案.如果我们靠猜测或者其他项目的经验去解决问题,那么我们可能会浪费很多时间,甚至使得问题更严重. 这时我们需要性能分析,性能分析程序测量游戏运行时的各个方面性能.通过性能分析工具,我们能够透过

Docker安全性——官方文档[译]

Docker安全性--官方文档[译] 本文译自Docker官方文档:https://docs.docker.com/articles/security/ 在审查Docker的安全时,需要考虑三个主要方面:?容器内在的安全性,由内核命名空间和cgroup中实现;?docker守护程序本身的攻击面;?加固内核安全特性,以及它们如何与容器中互动. 内核 命名空间 Kernel Namespace Docker容器中非常相似LXC容器,并且它们都具有类似的安全功能.当您以"docker run"

由浅入深Zookeeper详解(参考官方文档)

[老哥我最近接到个任务研究一下Zookeeper,对于我这个Linux运维领域的小菜鸟来说也是刚刚听到这个名字,为了养成良好的文档整理和学习能力,我人生第一次开通了博客并把这次的研究经历记录了下来,以后我会不定期的记录下来我对技术领域的探索,希望热爱Linux运维志同道合的兄弟们多指教,一同进步成长.(ps:我本人平时比较沉默,善于观察思考,对历史人物颇有见解,但是一旦说起话来就会滔滔不绝,谁让我曾经的梦想是当一名教师呢!哈哈!)同时,送给大家一句话,人生是一场马拉松比赛,只有坚持到最后的人,才

【译】StackExchange.Redis官方文档-中文版(序)

StackExchange.Redis官方文档-中文版(序) Intro 最近想深入学习一些 Redis 相关的东西.于是看了看官方的项目,发现里面有一份文档,于是打算翻译成中文,方便大家学习参考,如果有什么翻译不准确的地方,欢迎大家指出. 翻译进度 文档还有一部分还未翻译完,我会争取在三月中旬前将剩下的部分翻译结束. 翻译进度详见:https://github.com/WeihanLi/StackExchange.Redis-docs-cn 文档地址: 原文文档地址: https://gith

HBase 官方文档0.90.4

HBase 官方文档0.90.4 Copyright ? 2010 Apache Software Foundation, 盛大游戏-数据仓库团队-颜开(译) Revision History Revision 0.90.4 配置,数据模型使用入门 Abstract 这是 Apache HBase的官方文档, Hbase是一个分布式,版本化(versioned),构建在 Apache Hadoop和 Apache ZooKeeper上的列数据库. 我(译者)熟悉Hbase的源代码,从事Hbase