ceph的实验环境在公司内部用了一段时间,主要是利用rbd提供的块设备创建虚拟机、为虚拟机分配块,还是很稳定的。但现在的环境大部分配置还是ceph的默认值,只是将journal分离出来写到了一个单独的分区。后面打算利用ceph tier和ssd做一些优化:
1. 将journal写入一块单独的ssd磁盘。
2. 利用ssd配置一个ssd pool,将这个pool作为其它pool的cache,这就需要ceph tier。
网上搜索了一下,目前还没有这么实践的文章以及这么做后性能到底会提升多少。所以此方案实施后会进行相关测试:
1. 默认安装ceph。
2. 将journal分离到单独的普通硬盘分区。
3. 将journal分离到单独的ssd盘。
4. 加入ssd pool后。
crush的设置可以看这篇文章:http://www.sebastien-han.fr/blog/2012/12/07/ceph-2-speed-storage-with-crush/
I. Use case
Roughly say your infrastructure could be based on several type of servers:
- storage nodes full of SSDs disks
- storage nodes full of SAS disks
- storage nodes full of SATA disks
Such handy mecanism is possible with the help of the CRUSH Map.
II. A bit about CRUSH
CRUSH stands for Controlled Replication Under Scalable Hashing:
- Pseudo-random placement algorithm
- Fast calculation, no lookup Repeatable, deterministic
- Ensures even distribution
- Stable mapping
- Limited data migration
- Rule-based configuration, rule determines data placement
- Infrastructure topology aware, the map knows the structure of your infra (nodes, racks, row, datacenter)
- Allows weighting, every OSD has a weight
For more details check the Ceph Official documentation.
III. Setup
What are we going to do?
- Retrieve the current CRUSH Map
- Decompile the CRUSH Map
- Edit it. We will add 2 buckets and 2 rulesets
- Recompile the new CRUSH Map.
- Re-inject the new CRUSH Map.
III.1. Begin
Grab your current CRUSH map:
$ ceph osd getcrushmap -o ma-crush-map $ crushtool -d ma-crush-map -o ma-crush-map.txt
For the sake of simplicity, let’s assume that you have 4 OSDs:
- 2 of them are SAS disks
- 2 of them are SSD enterprise
And here is the OSD tree:
$ ceph osd tree dumped osdmap tree epoch 621 # id weight type name up/down reweight -1 12 pool default -3 12 rack le-rack -2 3 host ceph-01 0 1 osd.0 up 1 1 1 osd.1 up 1 -4 3 host ceph-02 2 1 osd.2 up 1 3 1 osd.3 up 1
III.2. Default crush map
Edit your CRUSH map:
# begin crush map # devices device 0 osd.0 device 1 osd.1 device 2 osd.2 device 3 osd.3 # types type 0 osd type 1 host type 2 rack type 3 row type 4 room type 5 datacenter type 6 pool # buckets host ceph-01 { id -2 # do not change unnecessarily # weight 3.000 alg straw hash 0 # rjenkins1 item osd.0 weight 1.000 item osd.1 weight 1.000 } host ceph-02 { id -4 # do not change unnecessarily # weight 3.000 alg straw hash 0 # rjenkins1 item osd.2 weight 1.000 item osd.3 weight 1.000 } rack le-rack { id -3 # do not change unnecessarily # weight 12.000 alg straw hash 0 # rjenkins1 item ceph-01 weight 2.000 item ceph-02 weight 2.000 } pool default { id -1 # do not change unnecessarily # weight 12.000 alg straw hash 0 # rjenkins1 item le-rack weight 4.000 } # rules rule data { ruleset 0 type replicated min_size 1 max_size 10 step take default step chooseleaf firstn 0 type host step emit } rule metadata { ruleset 1 type replicated min_size 1 max_size 10 step take default step chooseleaf firstn 0 type host step emit } rule rbd { ruleset 2 type replicated min_size 1 max_size 10 step take default step chooseleaf firstn 0 type host step emit } # end crush map
III.3. Add buckets and rules
Now we have to add 2 new specific rules:
- one for the SSD pool
- one for the SAS pool
III.3.1. SSD Pool
Add a bucket for the pool SSD:
pool ssd { id -5 # do not change unnecessarily alg straw hash 0 # rjenkins1 item osd.0 weight 1.000 item osd.1 weight 1.000 }
Add a rule for the bucket nearly created:
rule ssd { ruleset 3 type replicated min_size 1 max_size 10 step take ssd step choose firstn 0 type host step emit }
III.3.1. SAS Pool
Add a bucket for the pool SAS:
pool sas { id -6 # do not change unnecessarily alg straw hash 0 # rjenkins1 item osd.2 weight 1.000 item osd.3 weight 1.000 }
Add a rule for the bucket nearly created:
rule sas { ruleset 4 type replicated min_size 1 max_size 10 step take sas step choose firstn 0 type host step emit }
Eventually recompile and inject the new CRUSH map:
$ crushtool -c ma-crush-map.txt -o ma-nouvelle-crush-map $ ceph osd setcrushmap -i ma-nouvelle-crush-map
III.3. Create and configure the pools
Create your 2 new pools:
$ rados mkpool ssd successfully created pool ssd $ rados mkpool sas successfully created pool sas
Set the rule set to the pool:
ceph osd pool set ssd crush_ruleset 3 ceph osd pool set sas crush_ruleset 4
Check that the changes have been applied successfully:
$ ceph osd dump | grep -E ‘ssd|sas‘ pool 3 ‘ssd‘ rep size 2 crush_ruleset 3 object_hash rjenkins pg_num 128 pgp_num 128 last_change 21 owner 0 pool 4 ‘sas‘ rep size 2 crush_ruleset 4 object_hash rjenkins pg_num 128 pgp_num 128 last_change 23 owner 0
Just create some random files and put them into your object store:
$ dd if=/dev/zero of=ssd.pool bs=1M count=512 conv=fsync $ dd if=/dev/zero of=sas.pool bs=1M count=512 conv=fsync $ rados -p ssd put ssd.pool ssd.pool.object $ rados -p sas put sas.pool sas.pool.object
Where are pg active?
$ ceph osd map ssd ssd.pool.object osdmap e260 pool ‘ssd‘ (3) object ‘ssd.pool.object‘ -> pg 3.c5034eb8 (3.0) -> up [1,0] acting [1,0] $ ceph osd map sas sas.pool.object osdmap e260 pool ‘sas‘ (4) object ‘sas.pool.object‘ -> pg 4.9202e7ee (4.0) -> up [3,2] acting [3,2]
CRUSH Rules! As you can see from this article CRUSH allows you to perform amazing things. The CRUSH Map could be very complex, but it brings a lot of flexibility! Happy CRUSH Mapping ;-)
关于ceph tier的一些想法