
High Availability and PyMongo

PyMongo makes it easy to write highly available applications whether you use a single replica set or a large sharded cluster.

Connecting to a Replica Set

PyMongo makes working with replica sets easy. Here we’ll launch a new replica set and show how to handle both initialization and normal connections with PyMongo.

    Replica sets require server version >= 1.6.0. Support for connecting to replica sets also requires PyMongo version >= 1.8.0.
    副本集要求服务器版本不低于1.6.0. 要连接到副本集,要求PyMongo版本不低于 1.8.0.

See general MongoDB documentation rs ( http://dochub.mongodb.org/core/rs )

Starting a Replica Set

The main replica set documentation contains extensive information about setting up a new replica set or migrating an existing MongoDB setup, be sure to check that out. Here, we’ll just do the bare minimum to get a three node replica set setup locally.

    Replica sets should always use multiple nodes in production - putting all set members on the same physical node is only recommended for testing and development.
    生产环境中,副本集应用总是使用多个节点 - 将所有副本集成员放到一个物理节点上的行为,建议只在测试和开发环境中存在.

We start three mongod processes, each on a different port and with a different dbpath, but all using the same replica set name “foo”. In the example we use the hostname “morton.local”, so replace that with your hostname when running:
我们起了3个mongod进程,分别使用不同的端口,不同的db路径,它们使用同一个副本集名称"foo". 在示例中我们使用的hostname为"morton.local", 自己实验时别忘了改成你自己的hostname.

$ hostname
    $ mongod --replSet foo/morton.local:27018,morton.local:27019 --rest

$ mongod --port 27018 --dbpath /data/db1 --replSet foo/morton.local:27017 --rest

$ mongod --port 27019 --dbpath /data/db2 --replSet foo/morton.local:27017 --rest

Initializing the Set

At this point all of our nodes are up and running, but the set has yet to be initialized. Until the set is initialized no node will become the primary, and things are essentially “offline”.
现在所有的节点都起来了, 但是集合还需要初始化.初始化之前,集合中将没有主节点,本质上相当于offline.

To initialize the set we need to connect to a single node and run the initiate command. Since we don’t have a primary yet, we’ll need to tell PyMongo that it’s okay to connect to a slave/secondary:

>>> from pymongo import MongoClient, ReadPreference
    >>> c = MongoClient("morton.local:27017",

    We could have connected to any of the other nodes instead, but only the node we initiate from is allowed to contain any initial data.

After connecting, we run the initiate command to get things started (here we just use an implicit configuration, for more advanced configuration options see the replica set documentation):
连上一台db server之后,我们运行初始化命令来使集合运行起来(我们这里只用了一个显式的配置,更多高级的配置选项,参见 副本集 的文档):

>>> c.admin.command("replSetInitiate")
    {u‘info‘: u‘Config now saved locally.  Should come online in about a minute.‘,
     u‘info2‘: u‘no configuration explicitly specified -- making one‘, u‘ok‘: 1.0}

The three mongod servers we started earlier will now coordinate and come online as a replica set.
我们之前启动的三台mongod server现在将一起合作并且作为一个副本集而online了.

Connecting to a Replica Set

The initial connection as made above is a special case for an uninitialized replica set. Normally we’ll want to connect differently. A connection to a replica set can be made using the normal MongoClient() constructor, specifying one or more members of the set. For example, any of the following will create a connection to the set we just created:
前面的初始化连接是一种专门用来连接未初始化的副本集的情况.  通常情况下,我们不这么做(译者注: 因为通常我们不需要自己在程序里初始化副本集).
可以用一个普通的MongoClient()构造器通过制定一个或多个集合成员来连接到副本集. 例如,如下的方式都能连接到我们刚刚创建的副本集:
(这些方法可以连接未初始化的副本集吗? 应该不行. ??)

>>> MongoClient("morton.local", replicaset=‘foo‘)
    MongoClient([u‘morton.local:27019‘, ‘morton.local:27017‘, u‘morton.local:27018‘])
    >>> MongoClient("morton.local:27018", replicaset=‘foo‘)
    MongoClient([u‘morton.local:27019‘, u‘morton.local:27017‘, ‘morton.local:27018‘])
    >>> MongoClient("morton.local", 27019, replicaset=‘foo‘)
    MongoClient([‘morton.local:27019‘, u‘morton.local:27017‘, u‘morton.local:27018‘])
    >>> MongoClient(["morton.local:27018", "morton.local:27019"])
    MongoClient([‘morton.local:27019‘, u‘morton.local:27017‘, ‘morton.local:27018‘])
    >>> MongoClient("mongodb://morton.local:27017,morton.local:27018,morton.local:27019")
    MongoClient([‘morton.local:27019‘, ‘morton.local:27017‘, ‘morton.local:27018‘])

The nodes passed to MongoClient() are called the seeds. If only one host is specified the replicaset parameter must be used to indicate this isn’t a connection to a single node. As long as at least one of the seeds is online, the driver will be able to “discover” all of the nodes in the set and make a connection to the current primary.
种子中要至少有一台在线, driver才能"发现"副本集中所有的节点并且连接到当前的主节点.

Handling Failover
处理 failover

When a failover occurs, PyMongo will automatically attempt to find the new primary node and perform subsequent operations on that node. This can’t happen completely transparently, however. Here we’ll perform an example failover to illustrate how everything behaves. First, we’ll connect to the replica set and perform a couple of basic operations:
当failover发生时, Pymongo会自动尝试发现新的主节点并且在新的主节点上进行后续操作. 然而,这个过程并不是完全透明的. 我们将用一个示例failover来演示会发生什么事情.

>>> db = MongoClient("morton.local", replicaSet=‘foo‘).test
    >>> db.test.save({"x": 1})
    >>> db.test.find_one()
    {u‘x‘: 1, u‘_id‘: ObjectId(‘...‘)}

By checking the host and port, we can see that we’re connected to morton.local:27017, which is the current primary:
通过检查 host和port,我们可以看出我们当前连接到 morton.local:27017, 也就是当前的主节点:

>>> db.connection.host
    >>> db.connection.port

Now let’s bring down that node and see what happens when we run our query again:

>>> db.test.find_one()
    Traceback (most recent call last):
    pymongo.errors.AutoReconnect: ...

We get an AutoReconnect exception. This means that the driver was not able to connect to the old primary (which makes sense, as we killed the server), but that it will attempt to automatically reconnect on subsequent operations. When this exception is raised our application code needs to decide whether to retry the operation or to simply continue, accepting the fact that the operation might have failed.
我们得到一个 AutoReconnect 异常.这意味着驱动连接不到老的主节点(这就对了,我们刚刚杀掉了这个server), 但是驱动会尝试自动重连.

On subsequent attempts to run the query we might continue to see this exception. Eventually, however, the replica set will failover and elect a new primary (this should take a couple of seconds in general). At that point the driver will connect to the new primary and the operation will succeed:
后面再次尝试这个查询时,我们还是有可能看到这个异常. 不过,最终,副本集会重新选出一个主节点(这个过程通常需要几秒钟). 到时候,驱动会连接到这个新的主节点,操作就会成功了.

>>> db.test.find_one()
    {u‘x‘: 1, u‘_id‘: ObjectId(‘...‘)}
    >>> db.connection.host
    >>> db.connection.port


Using a MongoReplicaSetClient instead of a simple MongoClient offers two key features: secondary reads and replica set health monitoring. To connect using MongoReplicaSetClient just provide a host:port pair and the name of the replica set:
使用MongoReplicaSetClient替代MongoClient提供两个关键的特性: 读从库和副本集健康监控. 用MongoReplicaSetClient连接副本集只需要提供一个 host:port对和副本集名称即可:

>>> from pymongo import MongoReplicaSetClient
    >>> MongoReplicaSetClient("morton.local:27017", replicaSet=‘foo‘)
    MongoReplicaSetClient([u‘morton.local:27019‘, u‘morton.local:27017‘, u‘morton.local:27018‘])

Secondary Reads

By default an instance of MongoReplicaSetClient will only send queries to the primary member of the replica set. To use secondaries for queries we have to change the ReadPreference:
默认情况下,MongoReplicaSetClient的实例只会将查询发送到副本集的主节点. 为了使用读从库的功能我们需要修改ReadPreference.

>>> db = MongoReplicaSetClient("morton.local:27017", replicaSet=‘foo‘).test
    >>> from pymongo.read_preferences import ReadPreference
    >>> db.read_preference = ReadPreference.SECONDARY_PREFERRED

Now all queries will be sent to the secondary members of the set. If there are no secondary members the primary will be used as a fallback. If you have queries you would prefer to never send to the primary you can specify that using the SECONDARY read preference:
并非所有的查询都会被发送到副本集的从库. 如果没有从库,则查询会回溯到主节点. 如果你有些查询不希望发到主节点,你可以指定它使用 SECONDARY 读:

>>> db.read_preference = ReadPreference.SECONDARY

Read preference can be set on a client, database, collection, or on a per-query basis, e.g.:
读偏好 可以在client,database,collection或者单个查询为基础设定,例如:

>>> db.collection.find_one(read_preference=ReadPreference.PRIMARY)

Reads are configured using three options: read_preference, tag_sets, and secondary_acceptable_latency_ms.
有三个选项可以配置读操作: read_preference, tag_sets 和 secondary_acceptable_latency_ms.

- - - - - - - - -

        Read from the primary. This is the default, and provides the strongest consistency. If no primary is available, raise AutoReconnect.
        从主节点读. 这是默认行为, 而且提供了最强的一致性保障. 如果主节点不可用, 抛出 AutoReconnect 异常.
        Read from the primary if available, or if there is none, read from a secondary matching your choice of tag_sets and secondary_acceptable_latency_ms.
        如果主节点可用则读主节点, 如果不可用, 读第二个符合你的 tag_sets 和 secondary_acceptable_latency_ms 选择的节点.
        Read from a secondary matching your choice of tag_sets and secondary_acceptable_latency_ms. If no matching secondary is available, raise AutoReconnect.
        读第二个符合你的 tag_sets 和 secondary_acceptable_latency_ms 选择的节点. 如果不存在这样的节点, 抛出 AutoReconnect 异常.
        Read from a secondary matching your choice of tag_sets and secondary_acceptable_latency_ms if available, otherwise from primary (regardless of the primary’s tags and latency).
        读第二个符合你的 tag_sets 和 secondary_acceptable_latency_ms 选择的节点. 如果不存在这样的节点, 读主节点(忽略主节点的tags和latency).
    * NEAREST:
        Read from any member matching your choice of tag_sets and secondary_acceptable_latency_ms.
        从任意一个符合你 tag_sets 和 secondary_acceptable_latency_ms 选择的节点.

- - - - - -

Replica-set members can be tagged according to any criteria you choose. By default, MongoReplicaSetClient ignores tags when choosing a member to read from, but it can be configured with the tag_sets parameter. tag_sets must be a list of dictionaries, each dict providing tag values that the replica set member must match. MongoReplicaSetClient tries each set of tags in turn until it finds a set of tags with at least one matching member. For example, to prefer reads from the New York data center, but fall back to the San Francisco data center, tag your replica set members according to their location and create a MongoReplicaSetClient like so:
    副本集成员可以根据你选择的任何标准来打tag. 默认情况下, MongoReplicaSetClient 选择读节点时忽略tags, 但是这个行为可以通过tag_sets参数配置.
    tag_sets 必须是一个字典的列表,每一个字典提供副本集成员需要满足的tag 值. MongoReplicaSetClient 顺序尝试每一个tag集合,直到发现有至少一个匹配成员的tag集合.
    例如, 要优先从New York数据中心读数据, 其次从 San Francisco数据中心读, 可以给你的副本集按照位置打tag,并且创建一个这样的 MongoReplicaSetClient:

>>> rsc = MongoReplicaSetClient(
        ...     "morton.local:27017",
        ...     replicaSet=‘foo‘
        ...     read_preference=ReadPreference.SECONDARY,
        ...     tag_sets=[{‘dc‘: ‘ny‘}, {‘dc‘: ‘sf‘}]
        ... )

MongoReplicaSetClient tries to find secondaries in New York, then San Francisco, and raises AutoReconnect if none are available. As an additional fallback, specify a final, empty tag set, {}, which means “read from any member that matches the mode, ignoring tags.”
    MongoReplicaSetClient 尝试从NewYork寻找 secondaries, 然后尝试从 San Francisco找, 如果一个匹配都没有则抛出 AutoReconnect 异常.
    作为一个附加的跌落方案, 指定一个最终的,空的tag集合, {}, 这意味着"从任何一个匹配mode的成员读数据,忽略tags."

- - - - - - - - - - - - - - - - -

If multiple members match the mode and tag sets, MongoReplicaSetClient reads from among the nearest members, chosen according to ping time. By default, only members whose ping times are within 15 milliseconds of the nearest are used for queries. You can choose to distribute reads among members with higher latencies by setting secondary_acceptable_latency_ms to a larger number. In that case, MongoReplicaSetClient distributes reads among matching members within secondary_acceptable_latency_ms of the closest member’s ping time.
    如果多个成员匹配mode 和 tag集合, MongoReplicaSetClient将从最近的成员那里读数据, 以ping耗时排列远近. 默认情况下,只有ping延时比最近节点慢15毫秒以内的节点才会被查询.
    你可以通过将 secondary_acceptable_latency_ms 设置为一个大一点的数字来选择延迟高一些成员进行查询.
    这种情况下, MongoReplicaSetClient 将查询分发到延迟符合条件的成员中.

    secondary_acceptable_latency_ms is ignored when talking to a replica set through a mongos. The equivalent is the localThreshold command line option.

Health Monitoring

When MongoReplicaSetClient is initialized it launches a background task to monitor the replica set for changes in:
MongoReplicaSetClient初始化之后, 将启动一个后台进程来监控副本集的如下变化:

* Health: detect when a member goes down or comes up, or if a different member becomes primary
      健康: 检测成员的下线和上线, 或者主节点变更
    * Configuration: detect changes in tags
      配置: 检测tags 的变更
    * Latency: track a moving average of each member’s ping time
      延迟: 跟踪每个成员的平均ping耗时

Replica-set monitoring ensures queries are continually routed to the proper members as the state of the replica set changes.

It is critical to call close() to terminate the monitoring task before your process exits.
程序结束前,调用 close()方法结束监控任务 是很重要的.

High Availability and mongos
高可用性和 mongos

An instance of MongoClient can be configured to automatically connect to a different mongos if the instance it is currently connected to fails. If a failure occurs, PyMongo will attempt to find the nearest mongos to perform subsequent operations. As with a replica set this can’t happen completely transparently, Here we’ll perform an example failover to illustrate how everything behaves. First, we’ll connect to a sharded cluster, using a seed list, and perform a couple of basic operations:
MongoClient的实例可以配置成当前连接失败时自动连接到另一个mongos. 当失败发生时,PyMongo会尝试找出最近的mongos来进行后续的操作.
需iyu副本集来说,这不会是完全透明的,我们来人造一个failover演示一下事情会怎样.首先,我们连接到一个分片的集群,使用一个种子列表, 然后执行一些基本操作:

>>> db = MongoClient(‘morton.local:30000,morton.local:30001,morton.local:30002‘).test
    >>> db.test.save({"x": 1})
    >>> db.test.find_one()
    {u‘x‘: 1, u‘_id‘: ObjectId(‘...‘)}

Each member of the seed list passed to MongoClient must be a mongos. By checking the host, port, and is_mongos attributes we can see that we’re connected to morton.local:30001, a mongos:
传递给MongoClient的每一个种子列表都必须是一个mongos. 通过查看host,port和is_mongos属性 我们可以看到我们现在连接到 morton.local:30001, 一个mongos:

>>> db.connection.host
    >>> db.connection.port
    >>> db.connection.is_mongos

Now let’s shut down that mongos instance and see what happens when we run our query again:

>>> db.test.find_one()
    Traceback (most recent call last):
    pymongo.errors.AutoReconnect: ...

As in the replica set example earlier in this document, we get an AutoReconnect exception. This means that the driver was not able to connect to the original mongos at port 30001 (which makes sense, since we shut it down), but that it will attempt to connect to a new mongos on subsequent operations. When this exception is raised our application code needs to decide whether to retry the operation or to simply continue, accepting the fact that the operation might have failed.
这意味着驱动无法连接到最初的端口30001上的mongos了(这很正常,因为我们把它关了), 但是它会尝试为后续操作连接一个新的mongos.

As long as one of the seed list members is still available the next operation will succeed:

>>> db.test.find_one()
    {u‘x‘: 1, u‘_id‘: ObjectId(‘...‘)}
    >>> db.connection.host
    >>> db.connection.port
    >>> db.connection.is_mongos


时间: 2024-10-18 12:00:41


几招学会 Python 3 中 PyMongo 的用法

本文和大家分享的是Python3下MongoDB的存储操作相关内容,在看本文之前请确保你已经安装好了MongoDB并启动了其服务,另外安装好了Python的PyMongo库.下面进入正题,一起来看看吧,希望对大家学习Python3有所帮助. 连接MongoDB 连接MongoDB我们需要使用PyMongo库里面的MongoClient,一般来说传入MongoDB的IP及端口即可,第一个参数为地址host,第二个参数为端口port,端口如果不传默认是27017. import pymongo cl


防伪码:学海无涯苦作舟! MHA(Master HighAvailability)目前在MySQL高可用方面是一个相对成熟的解决方案,它由日本DeNA公司youshimaton(现就职于Facebook公司)开发,是一套优秀的作为MySQL高可用性环境下故障切换和主从提升的高可用软件.在MySQL故障切换过程中,MHA能做到在0~30秒之内自动完成数据库的故障切换操作,并且在进行故障切换的过程中,MHA能在最大程度上保证数据的一致性,以达到真正意义上的高可用. MHA里有两个角色一个是MHA N


一.架构图 二.配置步骤 所需软件 Mysql-5.6.20 mysql-mmm-agent-2.2.1 mysql-mmm-2.2.1 mysql-mmm-monitor-2.2.1 keepalived-1.2.13 安装步骤 1.四台机器安装mysql数据库(可直接yum安装或者编译安装) 2.设置主从关系 function IP Server Name server id monitor - - monitor master 192

SQL Server 高可用性(三)共享磁盘

一.共享磁盘 在群集技术中可能会用到共享磁盘.这类磁盘可以被多个节点同时访问,但任一时间只有主节点对共享磁盘享有使用权. 二.使用共享磁盘的场景 1. 仲裁磁盘 在搭建MSFC时,如果是偶数个节点,那么可以添加一个仲裁磁盘,从而使投票时可以形成"多数". 2. SQL Server Cluster的数据磁盘 SQL Server Cluster的本质,是将数据放在一个所有节点共享的磁盘上,当主节点Fail时,下一个节点通过获得共享磁盘的使用权,从而顺利启动SQL Server实例(服务


目录[-] 一.高可用性.负载均衡.复制的几个方案比较: 二.多节点集群方案比较 9.3官方文档(中文): 复制.集群和连接池: https://wiki.postgresql.org/wiki/Replication,_Clustering,_and_Connection_Pooling 集群方案功能列表: http://blog.osdba.net/46.html


我们对数据库安全常用的一些方案 凡是我们写成功的程序大部分都会和数据库进行交互,我们的数据库也必须有必要的措施防止数据库的崩溃.在我们学习高可用性解决方案之前我们都是用的数据库备份和还原(如果你连这个都没考虑到,那你写的程序也太不安全了).具体的备份的实现也有很多,比如说完整备份,差异备份--这里都不具体说了,大家可以去另外学习.但是这些备份会浪费好多时间,且随着数据库的增加几何性的增长?当一个网站的数据库发生故障时,我们不可能用备份的形式去完成数据库的维护.比如你正在京东买东西,突然京东的数据


#!/usr/bin/env python # -*- coding:utf-8 -*- """ MongoDB存储 在这里我们来看一下Python3下MongoDB的存储操作,在本节开始之前请确保你已经安装好了MongoDB并启动了其服务,另外安装好了Python 的PyMongo库. 连接MongoDB 连接MongoDB我们需要使用PyMongo库里面的MongoClient,一般来说传入MongoDB的IP及端口即可,第一个参数为地址host, 第二个参数为端口por


Keepalived简介 Keepalived是Linux下一个轻量级别的高可用解决方案.高可用(High Avalilability,HA),其实两种不同的含义:广义来讲,是指整个系统的高可用行,狭义的来讲就是之主机的冗余和接管, 它与HeartBeat RoseHA 实现相同类似的功能,都可以实现服务或者网络的高可用,但是又有差别,HeartBeat是一个专业的.功能完善的高可用软件,它提供了HA 软件所需的基本功能,比如:心跳检测.资源接管,检测集群中的服务,在集群节点转移共享IP地址的所


本文是工作中遇到的网站高并发问题及相关解决思路和方法的总结,比较零碎,且会在后续工作过程中不停丰富,有不对之处,请指正. 1.前后端分离 前后端分离后,可以使用更加轻量级的web容器部署静态资源,如nginx等,还可以对静态资源进行cdn加速,同时针对营销或者活动页面,可以使用内容管理平台,实时更改页面,快速迭代.服务端出来后,专注于业务流程,且为后续的服务拆分提供了便利. 2.按照业务领域拆分服务 随着业务量的增加,单体架构不能再满足高并发场景,按照业务领域将服务端拆分为多个独立的服务.服务拆