Ceph uniquely delivers object, block, and file storage in one unified system.
ceph提供了filesystem, block device, object store三种使用方式.
准确来说下面我们只讲它的block device,由于他们的基础都是要有一个可工作的Ceph Storage Cluster,因此这里先补充说明一些命令.
1.ceph相关命令
1)查看监控集群状态:
ceph health ceph status ceph osd stat ceph osd dump ceph osd tree ceph mon dump ceph quorum_status ceph mds stat ceph mds dump |
你可以分别试试看这些命令.
2)pools 大概可以理解为命名空间
查看已经存在的pools
[[email protected] ~]# ceph osd lspools 0 data,1 metadata,2 rbd, |
查看data pool中的pg_num属性
[[email protected] ~]# ceph osd pool get data pg_num pg_num: 256 |
查看data pool中的pgp_num属性
[[email protected] ~]# ceph osd pool get data pgp_num pgp_num: 256 |
创建一个pool ‘test-pool’
[[email protected] ~]# ceph osd pool create test-pool 256 256 pool ‘test-pool‘ created [[email protected] ~]# ceph osd lspools 0 data,1 metadata,2 rbd,3 test-pool, |
删除 ‘test-pool’
[[email protected] ~]# ceph osd pool delete test-pool test-pool --yes-i-really-really-mean-it pool ‘test-pool‘ deleted [[email protected] ~]# ceph osd lspools 0 data,1 metadata,2 rbd, |
3)CRUSH map相关
获取现有集群的crush map
[[email protected] ~]# ceph osd getcrushmap -o crush.map got crush map from osdmap epoch 734 |
反编译
[[email protected] ~]# cat crush.txt # begin crush map # devices device 0 osd.0 device 1 osd.1 device 2 osd.2 # types type 0 osd type 1 host type 2 rack type 3 row type 4 room type 5 datacenter type 6 root # buckets host test-1 { id -2 # do not change unnecessarily # weight 1.000 alg straw hash 0 # rjenkins1 item osd.0 weight 1.000 } host test-2 { id -4 # do not change unnecessarily # weight 1.000 alg straw hash 0 # rjenkins1 item osd.1 weight 1.000 } host test-3 { id -5 # do not change unnecessarily # weight 1.000 alg straw hash 0 # rjenkins1 item osd.2 weight 1.000 } rack unknownrack { id -3 # do not change unnecessarily # weight 3.000 alg straw hash 0 # rjenkins1 item test-1 weight 1.000 item test-2 weight 1.000 item test-3 weight 1.000 } root default { id -1 # do not change unnecessarily # weight 3.000 alg straw hash 0 # rjenkins1 item unknownrack weight 3.000 } # rules rule data { ruleset 0 type replicated min_size 1 max_size 10 step take default step chooseleaf firstn 0 type host step emit } rule metadata { ruleset 1 type replicated min_size 1 max_size 10 step take default step chooseleaf firstn 0 type host step emit } rule rbd { ruleset 2 type replicated min_size 1 max_size 10 step take default step chooseleaf firstn 0 type host step emit } # end crush map |
仔细观察这个输出信息,是不是发现了些什么有意思的事?请看官方文档的说明CRUSH
当你修改好了以后编译crush map
crushtool -c crush.txt -o crush.map |
将这个生成的crush map设置到集群中
ceph osd setcrushmap -i crush.map |
2.ceph block device相关命令
1)基本操作
创建一个block device image
[[email protected] ~]# rbd create test-image --size 1024 --pool test-pool [[email protected] ~]# rbd ls test-pool test-image |
查看这个image的详细信息
[[email protected] ~]# rbd --image test-image info --pool test-pool rbd image ‘test-image‘: size 1024 MB in 256 objects order 22 (4096 kB objects) block_name_prefix: rb.0.1483.6b8b4567 format: 1 |
删除这个image
[[email protected] ~]# rbd rm test-image -p test-pool Removing image: 100% complete...done. |
2)Kernel Modules
有时候我们需要将image挂载到本地,同时修改image中的一些信息,这就需要用到了map操作.
首先我们需要在内核中载入rbd模块(请确保之前内核升级的时候已选上了rbd相关)
modprobe rbd |
map test-image
rbd map test-image --pool test-pool --id admin |
查看mapped的设备
[[email protected] mycephfs]# rbd showmapped id pool image snap device 1 test-pool test-image - /dev/rbd1 |
我们看下/dev/rbd1的磁盘信息,然后mkfs,再挂载到/mnt/mycephfs目录下,在向里面创建一个包含’hello world’字符串的文件
[[email protected] ~]# fdisk -lu /dev/rbd1 Disk /dev/rbd1: 1073 MB, 1073741824 bytes 255 heads, 63 sectors/track, 130 cylinders, total 2097152 sectors Units = sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 4194304 bytes / 4194304 bytes Disk identifier: 0x00000000 [[email protected] ~]# mkfs.ext4 /dev/rbd1 mke2fs 1.41.12 (17-May-2010) Filesystem label= OS type: Linux Block size=4096 (log=2) Fragment size=4096 (log=2) Stride=1024 blocks, Stripe width=1024 blocks 65536 inodes, 262144 blocks 13107 blocks (5.00%) reserved for the super user First data block=0 Maximum filesystem blocks=268435456 8 block groups 32768 blocks per group, 32768 fragments per group 8192 inodes per group Superblock backups stored on blocks: 32768, 98304, 163840, 229376 Writing inode tables: done Creating journal (8192 blocks): done Writing superblocks and filesystem accounting information: done This filesystem will be automatically checked every 33 mounts or 180 days, whichever comes first. Use tune2fs -c or -i to override. [[email protected] ~]# mount /dev/rbd1 /mnt/mycephfs/ [[email protected] ~]# ll /mnt/mycephfs/ total 16 drwx------ 2 root root 16384 Nov 27 13:40 lost+found [[email protected] ~]# cd /mnt/mycephfs/ [[email protected] mycephfs]# ls lost+found [[email protected] mycephfs]# echo ‘hello‘ > hello.txt [[email protected] mycephfs]# ls hello.txt lost+found [[email protected] mycephfs]# df -h /mnt/mycephfs/ Filesystem Size Used Avail Use% Mounted on /dev/rbd1 976M 1.3M 908M 1% /mnt/mycephfs |
我们同时也可以改变image的容量大小
[[email protected] mycephfs]# rbd resize --size 2048 test-image rbd: error opening image test-image: (2) No such file or directory 2013-11-27 13:48:24.290564 7fcf3b185760 -1 librbd::ImageCtx: error finding header: (2) No such file or directory [[email protected] mycephfs]# rbd resize --size 2048 test-image --pool test-pool Resizing image: 100% complete...done. [[email protected] mycephfs]# df -h /mnt/mycephfs/ Filesystem Size Used Avail Use% Mounted on /dev/rbd1 976M 1.3M 908M 1% /mnt/mycephfs [[email protected] mycephfs]# blockdev --getsize64 /dev/rbd1 2147483648 [[email protected] mycephfs]# resize2fs /dev/rbd1 resize2fs 1.41.12 (17-May-2010) Filesystem at /dev/rbd1 is mounted on /mnt/mycephfs; on-line resizing required old desc_blocks = 1, new_desc_blocks = 1 Performing an on-line resize of /dev/rbd1 to 524288 (4k) blocks. The filesystem on /dev/rbd1 is now 524288 blocks long. [[email protected] mycephfs]# df -h /mnt/mycephfs/ Filesystem Size Used Avail Use% Mounted on /dev/rbd1 2.0G 1.6M 1.9G 1% /mnt/mycephfs [[email protected] mycephfs]# ls hello.txt lost+found |
当我们修改完毕image内容后就可以unmap掉它了,之前你需要执行umount操作,当你下次map的时候之前创建的hello.txt依然会存在挂载目录下.
[[email protected] mnt]# umount /dev/rbd1 [[email protected] mnt]# rbd unmap /dev/rbd1 |
3)快照相关
有些时候我们需要对image进行snapshot操作,以便将来可以随时恢复到当时状态.
好我们对test-pool下的test-image进行snap操作
[[email protected] mnt]# rbd snap create test-pool/[email protected] rbd: failed to create snapshot: (22) Invalid argument 2013-11-27 14:56:53.109819 7f5bea81d760 -1 librbd: failed to create snap id: (22) Invalid argument |
提示错误:Invalid argument,搞了半天才知道问题出在’test-pool’, ‘test-image’名字中的’-’上面,
我们新建个pool叫’mypool’同时在下面创建一个’myimage’
[[email protected] ceph]# ceph osd pool create mypool 256 256 pool ‘mypool‘ created [[email protected] ceph]# rbd create myimage --size 1024 --pool mypool [[email protected] ceph]# rbd --pool mypool ls myimage |
好,接下来创建snap,快照名字叫’snapimage’
[[email protected] ceph]# rbd snap create mypool/[email protected] |
查看myimage的snap
[[email protected] ceph]# rbd snap ls mypool/myimage SNAPID NAME SIZE 2 snapimage 1024 MB |
接下来我们测试下这个snap吧
[[email protected] ceph]# rbd snap create mypool/[email protected] [[email protected] ceph]# rbd map mypool/myimage [[email protected] ceph]# mount /dev/rbd1 /mnt/mycephfs/ [[email protected] ceph]# ls /mnt/mycephfs/ hello.txt lost+found [[email protected] ceph]# echo ‘welcome to zhengtianbao.com ‘ > /mnt/mycephfs/info.txt [[email protected] ceph]# ls /mnt/mycephfs/ hello.txt info.txt lost+found [[email protected] ceph]# umount /dev/rbd1 [[email protected] ceph]# rbd unmap /dev/rbd1 [[email protected] ceph]# rbd snap rollback mypool/[email protected] Rolling back to snapshot: 100% complete...done. [[email protected] ceph]# rbd map mypool/myimage [[email protected] ceph]# mount /dev/rbd1 /mnt/mycephfs/ [[email protected] ceph]# ls /mnt/mycephfs/ hello.txt lost+found |
是不是如预计的那样myimage回到了snapimage3时候的状态,之后创建的info.txt已经消失了.
删除snap
[[email protected] ceph]# rbd snap ls mypool/myimage SNAPID NAME SIZE 2 snapimage 1024 MB 3 snapimage2 1024 MB 4 snapimage3 1024 MB [[email protected] ceph]# rbd snap rm mypool/[email protected] [[email protected] ceph]# rbd snap ls mypool/myimage SNAPID NAME SIZE 3 snapimage2 1024 MB 4 snapimage3 1024 MB |
删除myimage的全部snapshot
[[email protected] ceph]# rbd snap purge mypool/myimage Removing all snapshots: 100% complete...done. |
4)libvirt
与libvirt配合使用,libvirt中定义domain的device使用ceph block device.
关于libvirt,大体的就是一个中间层,与rbd配合使用的关系大概如下:
libvirt-->qemu-->librbd-->librados-->osds |--->monitors |
有关libvirt和qemu以后有机会再补上.
另外,请确保qemu在configure的时候enable rbd.
首先需要有一个制作好的镜像,我这里用centos6的一个镜像
[[email protected]1 ~]# file centos6 centos6: x86 boot sector; GRand Unified Bootloader, stage1 version 0x3, boot drive 0x80, 1st sector stage2 0x849d4, GRUB version 0.94; partition 1: ID=0x83, active, starthead 32, startsector 2048, 1024000 sectors; partition 2: ID=0x8e, starthead 221, startsector 1026048, 19945472 sectors, code offset 0x48 |
通过qemu-img convert命令将这个镜像放置到mypool中,取名为centos
[[email protected] ceph]# qemu-img convert ~/centos6 rbd:mypool/centos [[email protected] ceph]# rbd ls --pool mypool centos myimage [[email protected] ceph]# rbd info centos --pool mypool rbd image ‘centos‘: size 10240 MB in 2560 objects order 22 (4096 kB objects) block_name_prefix: rb.0.14d4.6b8b4567 format: 1 |
然后我们创建一个libvirt需要用到的domain xml文件,这里只是个简单的例子
test.xml
<domaintype=‘kvm‘> <name>test-ceph</name> <memoryunit=‘KiB‘>4194304</memory> <currentMemoryunit=‘KiB‘>4194304</currentMemory> <vcpuplacement=‘static‘>4</vcpu> <os> <typearch=‘x86_64‘machine=‘pc-i440fx-1.5‘>hvm</type> <bootdev=‘hd‘/> <bootmenuenable=‘yes‘/> </os> <features> <acpi/> <apic/> <pae/> </features> <clockoffset=‘utc‘/> <on_poweroff>destroy</on_poweroff> <on_reboot>restart</on_reboot> <on_crash>restart</on_crash> <devices> <emulator>/usr/libexec/qemu-kvm</emulator> <disktype=‘network‘device=‘disk‘> <drivername=‘qemu‘type=‘raw‘/> <sourceprotocol=‘rbd‘name=‘mypool/centos‘> <hostname=‘localhost‘port=‘6789‘/> </source> <targetdev=‘hda‘bus=‘ide‘/> <addresstype=‘drive‘controller=‘0‘bus=‘0‘target=‘0‘unit=‘0‘/> </disk> <controllertype=‘ide‘index=‘0‘> <addresstype=‘pci‘domain=‘0x0000‘bus=‘0x00‘slot=‘0x01‘function=‘0x1‘/> </controller> <controllertype=‘usb‘index=‘0‘> <addresstype=‘pci‘domain=‘0x0000‘bus=‘0x00‘slot=‘0x01‘function=‘0x2‘/> </controller> <inputtype=‘tablet‘bus=‘usb‘/> <inputtype=‘mouse‘bus=‘ps2‘/> <graphicstype=‘vnc‘port=‘-1‘autoport=‘yes‘/> <video> <modeltype=‘vga‘ram=‘65536‘vram=‘9216‘heads=‘1‘/> <addresstype=‘pci‘domain=‘0x0000‘bus=‘0x00‘slot=‘0x02‘function=‘0x0‘/> </video> <memballoonmodel=‘virtio‘> <addresstype=‘pci‘domain=‘0x0000‘bus=‘0x00‘slot=‘0x05‘function=‘0x0‘/> </memballoon> </devices> </domain> |
接下来通过virsh命令创建虚拟机,查看vnc端口
[[email protected] ceph]# virsh define test.xml [[email protected] ceph]# virsh list --all Id Name State ---------------------------------------------------- - test-ceph shut off [[email protected] ceph]# virsh start test-ceph Domain test-ceph started [[email protected] ceph]# virsh list Id Name State ---------------------------------------------------- 1 test-ceph running [[email protected] ceph]# virsh vncdisplay 1 :0 |
ok,现在我们可以通过vnc客户端连接到host:5900端口的虚拟机中进行操作了,同时你也可以在虚拟机中测试下ceph的读写性能如何…
一些链接:
[1]IBM关于ceph的说明: http://www.ibm.com/developerworks/cn/linux/l-ceph/
[2]ceph架构方面: http://www.ustack.com/blog/ceph_infra/
[3]ceph性能测试: http://tech.uc.cn/?p=1223#more-1223
ceph的使用