Mysql高性能之Memcached(2)

本文将介绍在部署Memcached中需要注意的问题以及Memcached的分布式算法

无论你是新上线的系统还是已经上线很久的系统。我们都可以很简单的对Memcached进行配置,但是配置之前我们需要注意如下问题:

1.memcached is only a caching mechanism. It shouldn‘t be used to store information that you cannot

otherwise afford to lose and then load from a different location.

2.There is no security built into the memcached protocol. At a minimum, make sure that the servers

running memcached are only accessible from inside your network, and that the network ports being

used are blocked (using a firewall or similar). If the information on the memcached servers that is

being stored is any sensitive, then encrypt the information before storing it in memcached.

3. memcached does not provide any sort of failover. Because there is no communication between

different memcached instances. If an instance fails, your application must capable of removing it from

the list, reloading the data and then writing data to another memcached instance.

4. Latency between the clients and the memcached can be a problem if you are using different physical

machines for these tasks. If you find that the latency is a problem, move the memcached instances to

be on the clients.

5. Key length is determined by the memcached server. The default maximum key size is 250 bytes.

6. Try to use at least two memcached instances, especially for multiple clients, to avoid having a single

point of failure. Ideally, create as many memcached nodes as possible. When adding and removing

memcached instances from a pool, the hashing and distribution of key/value pairs may be affected.

7.Use Namespace.The memcached cache is a very simple massive key/value storage system, and as such there is no

way of compartmentalizing data automatically into different sections. For example, if you are storing

information by the unique ID returned from a MySQL database, then storing the data from two different

tables could run into issues because the same ID might be valid in both tables.

Some interfaces provide an automated mechanism for creating namespaces when storing information

into the cache. In practice, these namespaces are merely a prefix before a given ID that is applied

every time a value is stored or retrieve from the cache.

You can implement the same basic principle by using keys that describe the object and the unique

identifier within the key that you supply when the object is stored. For example, when storing user data,

prefix the ID of the user with user: or user-.

Memcached distribution algorithms:

The memcached client interface supports a number of different distribution algorithms that are used in

multi-server configurations to determine which host should be used when setting or getting data from

a given memcached instance. When you get or set a value, a hash is constructed from the supplied

key and then used to select a host from the list of configured servers. Because the hashing mechanism

uses the supplied key as the basis for the hash, the same server is selected during both set and get

operations.

You can think of this process as follows. Given an array of servers (a, b, and c), the client uses a

hashing algorithm that returns an integer based on the key being stored or retrieved. The resulting

value is then used to select a server from the list of servers configured in the client. Most standard

client hashing within memcache clients uses a simple modulus calculation on the value against the

number of configured memcached servers. You can summarize the process in pseudocode as:

@memcservers = [‘a.memc‘,‘b.memc‘,‘c.memc‘];

$value = hash($key);

$chosen = $value % length(@memcservers);

Replacing the above with values:

@memcservers = [‘a.memc‘,‘b.memc‘,‘c.memc‘];

$value = hash(‘myid‘);

$chosen = 7009 % 3;

In the above example, the client hashing algorithm chooses the server at index 1 ( 7009 % 3 = 1),

and store or retrieve the key and value with that server.

Using this method provides a number of advantages:

? The hashing and selection of the server to contact is handled entirely within the client. This

eliminates the need to perform network communication to determine the right machine to contact.

? Because the determination of the memcached server occurs entirely within the client, the server can

be selected automatically regardless of the operation being executed (set, get, increment, etc.).

? Because the determination is handled within the client, the hashing algorithm returns the same value

for a given key; values are not affected or reset by differences in the server environment.

? Selection is very fast. The hashing algorithm on the key value is quick and the resulting selection of

the server is from a simple array of available machines.

? Using client-side hashing simplifies the distribution of data over each memcached server. Natural

distribution of the values returned by the hashing algorithm means that keys are automatically spread

over the available servers.

Providing that the list of servers configured within the client remains the same, the same stored key

returns the same value, and therefore selects the same server.

However, if you do not use the same hashing mechanism then the same data may be recorded

on different servers by different interfaces, both wasting space on your memcached and leading to

potential differences in the information.

The problem with client-side selection of the server is that the list of the servers (including their

sequential order) must remain consistent on each client using the memcached servers, and the servers

must be available. If you try to perform an operation on a key when:

? A new memcached instance has been added to the list of available instances

? A memcached instance has been removed from the list of available instances

? The order of the memcached instances has changed

When the hashing algorithm is used on the given key, but with a different list of servers, the hash

calculation may choose a different server from the list.

If a new memcached instance is added into the list of servers, as new.memc is in the example below,

then a GET operation using the same key, myid, can result in a cache-miss. This is because the same

value is computed from the key, which selects the same index from the array of servers, but index 2

now points to the new server, not the server c.memc where the data was originally stored. This would

result in a cache miss, even though the key exists within the cache on another memcached instance.

This means that servers c.memc and new.memc both contain the information for key myid, but the

information stored against the key in eachs server may be different in each instance. A more significant

problem is a much higher number of cache-misses when retrieving data, as the addition of a new

server changes the distribution of keys, and this in turn requires rebuilding the cached data on the

memcached instances, causing an increase in database reads.

The same effect can occur if you actively manage the list of servers configured in your clients, adding

and removing the configured memcached instances as each instance is identified as being available.

For example, removing a memcached instance when the client notices that the instance can no longer

be contacted can cause the server selection to fail as described here.

To prevent this causing significant problems and invalidating your cache, you can select the hashing

algorithm used to select the server. There are two common types of hashing algorithm, consistent and

modula.

With consistent hashing algorithms, the same key when applied to a list of servers always uses the

same server to store or retrieve the keys, even if the list of configured servers changes. This means

that you can add and remove servers from the configure list and always use the same server for a

given key. There are two types of consistent hashing algorithms available, Ketama and Wheel. Both

types are supported by libmemcached, and implementations are available for PHP and Java.

Any consistent hashing algorithm has some limitations. When you add servers to an existing list of

configured servers, keys are distributed to the new servers as part of the normal distribution. When you

remove servers from the list, the keys are re-allocated to another server within the list, meaning that

the cache needs to be re-populated with the information. Also, a consistent hashing algorithm does not

resolve the issue where you want consistent selection of a server across multiple clients, but where

each client contains a different list of servers. The consistency is enforced only within a single client.

With a modula hashing algorithm, the client selects a server by first computing the hash and then

choosing a server from the list of configured servers. As the list of servers changes, so the server

selected when using a modula hashing algorithm also changes. The result is the behavior described

above; changes to the list of servers mean that different servers are selected when retrieving data,

leading to cache misses and increase in database load as the cache is re-seeded with information.

If you use only a single memcached instance for each client, or your list of memcached servers

configured for a client never changes, then the selection of a hashing algorithm is irrelevant, as it has

no noticeable effect.

If you change your servers regularly, or you use a common set of servers that are shared among a

large number of clients, then using a consistent hashing algorithm should help to ensure that your

cache data is not duplicated and the data is evenly distributed.

Memory Allocation within memcached:

When you first start memcached, the memory that you have configured is not automatically allocated.

Instead, memcached only starts allocating and reserving physical memory once you start saving

information into the cache.

When you start to store data into the cache, memcached does not allocate the memory for the data

on an item by item basis. Instead, a slab allocation is used to optimize memory usage and prevent

memory fragmentation when information expires from the cache.

With slab allocation, memory is reserved in blocks of 1MB. The slab is divided up into a number of

blocks of equal size. When you try to store a value into the cache, memcached checks the size of the

value that you are adding to the cache and determines which slab contains the right size allocation for

the item. If a slab with the item size already exists, the item is written to the block within the slab.

If the new item is bigger than the size of any existing blocks, then a new slab is created, divided up into

blocks of a suitable size. If an existing slab with the right block size already exists, but there are no free

blocks, a new slab is created. If you update an existing item with data that is larger than the existing

block allocation for that key, then the key is re-allocated into a suitable slab.

For example, the default size for the smallest block is 88 bytes (40 bytes of value, and the default 48

bytes for the key and flag data). If the size of the first item you store into the cache is less than 40

bytes, then a slab with a block size of 88 bytes is created and the value stored.

If the size of the data that you intend to store is larger than this value, then the block size is increased

by the chunk size factor until a block size large enough to hold the value is determined. The block size

is always a function of the scale factor, rounded up to a block size which is exactly divisible into the

chunk size.

时间: 2024-08-13 15:26:33

Mysql高性能之Memcached(2)的相关文章

Mysql高性能之Memcached(1)

本文将介绍Memcached的安装与使用 What is Memcached? Free & open source, high-performance, distributed memory object caching system, generic in nature, but intended for use in speeding up dynamic web applications by alleviating database load. Memcached is an in-m

基于CentOS 5.4搭建nginx+php+spawn-fcgi+mysql高性能php平台

一.安装准备 1.1平台环境: CentOS 5.4 x86_64 GNU/Linux nginx-0.8.21 php-5.2.9 spawn-fcgi-1.6.3 mysql-5.1.34 .2系统安装及分区:1.2.1操作系统安装:         安装过程中选择最少的包,采用文本模式安装,不安装图形.1.2.3系统分区:         /boot  100M    (大约100左右)          SWAP  4G      物理内存的2倍(如果你的物理内存大于4G,分配4G即可)

nginx+PHP+memcached+MySQL+ip-hash做memcached集群

1.nginx与memcached整合 #安装memcached支持的事务库libevent wget https://github.com/libevent/libevent/releases/download/release-2.0.22-stable/libevent-2.0.22-stable.tar.gz tar zxf libevent-2.0.22-stable.tar.gz  cd libevent-2.0.22-stable ./configure --prefix=/usr/

Mysql高性能优化规范建议

原文:Mysql高性能优化规范建议 数据库命令规范 所有数据库对象名称必须使用小写字母并用下划线分割 所有数据库对象名称禁止使用mysql保留关键字(如果表名中包含关键字查询时,需要将其用单引号括起来) 数据库对象的命名要能做到见名识意,并且最后不要超过32个字符 临时库表必须以tmp_为前缀并以日期为后缀,备份表必须以bak_为前缀并以日期(时间戳)为后缀 所有存储相同数据的列名和列类型必须一致(一般作为关联列,如果查询时关联列类型不一致会自动进行数据类型隐式转换,会造成列上的索引失效,导致查

Nginx + MySQL + PHP + Xcache + Memcached

传统上基于进程或线程模型架构的web服务通过每进程或每线程处理并发连接请求,这势必会在网络和I/O操作时产生阻塞,其另一个必然结果则是对内存或CPU的利用率低下.生成一个新的进程/线程需要事先备好其运行时环境,这包括为其分配堆内存和栈内存,以及为其创建新的执行上下文等.这些操作都需要占用CPU,而且过多的进程/线程还会带来线程抖动或频繁的上下文切换,系统性能也会由此进一步下降. 在设计的最初阶段,nginx的主要着眼点就是其高性能以及对物理计算资源的高密度利用,因此其采用了不同的架构模型.受启发

MySQL 高性能表设计规范

良好的逻辑设计和物理设计是高性能的基石, 应该根据系统将要执行的查询语句来设计schema, 这往往需要权衡各种因素. 一.选择优化的数据类型 MySQL支持的数据类型非常多, 选择正确的数据类型对于获得高性能至关重要. 更小的通常更好 更小的数据类型通常更快, 因为它们占用更少的磁盘. 内存和CPU缓存, 并且处理时需要的CPU周期也更少. 简单就好 简单数据类型的操作通常需要更少的CPU周期. 例如, 整型比字符操作代价更低, 因为字符集和校对规则(排序规则 )使字符比较比整型比较更复杂.

MySQL高性能以及高安全测试

1.  参数描述 sync_binlog Command-Line Format --sync-binlog=# Option-File Format sync_binlog System Variable Name sync_binlog Variable Scope Global Dynamic Variable Yes Permitted Values Platform Bit Size 32 Type numeric Default 0 Range 0 .. 4294967295 Per

mysql高性能索引(Ⅰ)

在开发中,我们知道大多数应用的瓶颈在于sql语句的执行时耗,在这里并不讨论sql语句的安全,仅仅讨论高性能sql语句,而与高性能sql语句紧密相连的就是传说中的----索引. 索引--一种工作在存储引擎端的用于快速找到记录的一种数据结构. mysql使用索引的方式是:先找到索引的值,再根据索引的值找到数据行. 索引之B-Tree索引 B-Tree索引通常意味着所有的值都是按顺序存储的,每个叶子节点到根的距离相同.图示: B-Tree索引能够快速的访问数据,因为存储引擎不需要进行全表扫描来获得数据

史上最全的MySQL高性能优化实战总结!

1.1 前言 MySQL对于很多Linux从业者而言,是一个非常棘手的问题,多数情况都是因为对数据库出现问题的情况和处理思路不清晰.在进行MySQL的优化之前必须要了解的就是MySQL的查询过程,很多的查询优化工作实际上就是遵循一些原则让MySQL的优化器能够按照预想的合理方式运行而已. 今天给大家体验MySQL的优化实战,助你高薪之路顺畅.图 - MySQL查询过程 1.2 优化的哲学 优化有风险,涉足需谨慎 1.2.1 优化可能带来的问题 1.2.2 优化的需求1.2.3 优化由谁参与 在进