测试的主要目的是观察kvm上安装的虚拟机对物力资源的性能损耗。这次主要是对kvm虚拟化的cpu,内存和io进行性能对比测试,具体方法是:在非虚拟化的原生系统中执行某个基准测试程序,然后将该测试程序放到与原生系统配置相近的虚拟客户机中执行,接着对比在虚拟化和非虚拟化环境中该测试程序执行的性能。为了测试的准确性,尽量保证测试环境和原生系统环境的一致性。在/etc/grub/grub.cfg文件中,在启动内核的那一行添加maxcpus=2 nr_cpus=2 mem=2G这几个参数即可限制linux内核加载的cpu核心数和内存大小。
set root=‘(hd0,msdos1)‘ search --no-floppy --fs-uuid --set=root 3940bb4d-c220-4cb5-b4f5-6dd11c5ecb44 linux /boot/vmlinuz-3.2.0-83-generic root=UUID=3940bb4d-c220-4cb5-b4f5-6dd11c5ecb44 ro quiet maxcpus=2 nr_cpus=2 mem=2G initrd /boot/initrd.img-3.2.0-83-generic
上面是ubuntu中的/etc/grub/grub.cfg文件内容,redhat系的略有区别。
原生系统和虚拟机上的系统都是1颗cpu,2个核心,2G的内存。由于对io测试时,仅读取512M的大小进行测试,所以物理机和虚拟机上磁盘大小的区别影响不大。
cpu性能测试
对cpu的性能测试选择Super PI这个工具,本次Super PI的基准测试中选择计算圆周率π的小数点后面2的20次方个数据位和2的24次方个数据位.在计算完成后,程序会输出本次计算所花费的时间。命令如下:
[email protected]:~/super_pi# ./super_pi 20 ...
[email protected]:~/super_pi# ./super_pi 24 ...
在x86_64架构的系统上运行Super PI执行程序,可能会找不到ld-linux.so.2共享库,这是由于Super PI程序比较老,在ubuntu上安装下libc6-i386包即可。
[email protected]:~/super_pi# apt-cache search libc6-i386 libc6-i386 - Embedded GNU C Library: 32-bit shared libraries for AMD64 [email protected]:~/super_pi# apt-get install libc6-i386 ...
程序运行结束后,会输出Total calculation(I/O) time:
./pi 20 | 第一次测试 | 第二次 | 第三次 | 第四次 | 第五次 |
host_ubuntu | 12.037 | 11.785 | 11.744 | 11.911 | 11.852 |
virt_ubuntu | 11.986 | 11.925 | 11.994 | 12.04 | 11.919 |
./pi 24 | |||||
host_ubuntu | 333.967 | 332.558 | 331.512 | 335.048 | 331.745 |
virt_ubuntu | 342.457 | 342.003 | 339.275 | 342.685 | 343.375 |
通过比较可以看出kvm虚拟化中cpu性能为原生系统的97%左右。
内存性能测试
内存的测试使用LMbench这款工具,LMbench中包含很多简单的基准测试,覆盖了文档读写、内存操作、管道、系统调用、上下文切换、进程创建和销毁、网络等多方面的性能测试。另外,LMbench能够对同级别的系统进行比较测试,反映不同系统的优劣势,通过选择不同的库函数就能够比较库函数的性能。
接下来从网上下载LMbench,下载得到lmbench3.tar.gz,解压之后,运行make即可进行编译。
[email protected]:/home/luyi# tar -zx -f lmbench3.tar.gz -C lmbench3 [email protected]:/home/luyi# cd lmbench3/ [email protected]:/home/luyi/lmbench3# make ...
在编译过程中可能会遇到如下错误提示:
make[2]: *** No rule to make target `../SCCS/s.ChangeSet‘, needed by `bk.ver‘. Stop. make[2]: Leaving directory `/home/luyi/lmbench3/lmbench3/src‘ make[1]: *** [lmbench] Error 2 make[1]: Leaving directory `/home/luyi/lmbench3/lmbench3/src‘ make: *** [build] Error 2
新建相关目录和文件即可绕过该错误,然后运行make results来进行测试:
[email protected]:/home/luyi/lmbench3/lmbench3# mkdir SCCS ; touch SCCS/s.ChangeSet [email protected]:/home/luyi/lmbench3/lmbench3# make ... [email protected]:/home/luyi/lmbench3/lmbench3# make results ...
运行make results后,在正式运行测试之前,会有一些交互式的操作以便确认测试时使用的具体配置,多数提示只需要按Enter键选择默认值即可在本次测试中,没有使用默认值的配置有3个:LMbench测试的内存值、处理器时钟频率,以及是否将测试结果发到LMbench3的官方邮箱。
cpu的时钟频率可以参考这个:
[email protected]:~$ cat /proc/cpuinfo | grep "model name" model name : Pentium(R) Dual-Core CPU E5700 @ 3.00GHz model name : Pentium(R) Dual-Core CPU E5700 @ 3.00GHz
没有使用默认值的配置:
MB [default 2744] 1024 #测试的内存越大,需要的时间越长 Checking to see if you have 1024 MB; please wait for a moment... ... Processor mhz [default 2997 MHz, 0.3337 nanosec clock] 3000 ... Mail results [default yes] no OK, no results mailed.
LMbench根据配置文档执行完成所需要的测试项之后,在results目录下根据系统类型、系统名和操作系统类型等生成一个子目录,测试结果文档按照“主机名+序号”的命名方式存放于该目录下。运行make see命令可以查看测试结果报告及其说明。
[email protected]:/home/luyi/lmbench3/lmbench3/results# make see
可以将测试的结果文档统一放在lmbench3/lmbench3/results/x86_64-linux-gnu目录下,然后运行make see命令即可查看到非常直观的结果对比报告。下面是测试的两组数据,原生系统上和虚拟化环境中各3次测试:
L M B E N C H 3 . 0 S U M M A R Y ------------------------------------ (Alpha software, do not distribute) Basic system parameters ------------------------------------------------------------------------------ Host OS Description Mhz tlb cache mem scal pages line par load bytes --------- ------------- ----------------------- ---- ----- ----- ------ ---- baby-ubun Linux 3.16.0- x86_64-linux-gnu 2300 32 128 3.6500 1 baby-ubun Linux 3.16.0- x86_64-linux-gnu 2300 32 128 3.6000 1 baby-ubun Linux 3.16.0- x86_64-linux-gnu 2300 32 128 3.7100 1 virt-ubun Linux 3.16.0- x86_64-linux-gnu 2300 32 128 3.7800 1 virt-ubun Linux 3.16.0- x86_64-linux-gnu 2300 32 128 3.7800 1 Processor, Processes - times in microseconds - smaller is better ------------------------------------------------------------------------------ Host OS Mhz null null open slct sig sig fork exec sh call I/O stat clos TCP inst hndl proc proc proc --------- ------------- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- baby-ubun Linux 3.16.0- 2300 0.05 0.12 0.34 1.12 3.04 0.12 0.78 91.6 261. 607. baby-ubun Linux 3.16.0- 2300 0.05 0.12 0.35 1.11 2.93 0.12 0.78 97.8 266. 610. baby-ubun Linux 3.16.0- 2300 0.05 0.12 0.34 1.11 2.91 0.12 0.77 95.8 261. 605. virt-ubun Linux 3.16.0- 2300 0.05 0.12 0.35 1.20 2.97 0.12 0.88 99.0 289. 653. virt-ubun Linux 3.16.0- 2300 0.05 0.12 0.35 1.11 2.93 0.12 0.86 98.4 278. 641. virt-ubun Linux 3.16.0- 2300 0.05 0.12 0.35 1.14 2.92 0.12 0.90 105. 290. 660. Basic integer operations - times in nanoseconds - smaller is better ------------------------------------------------------------------- Host OS intgr intgr intgr intgr intgr bit add mul div mod --------- ------------- ------ ------ ------ ------ ------ baby-ubun Linux 3.16.0- 0.3500 0.0500 1.0700 7.8100 8.7300 baby-ubun Linux 3.16.0- 0.3500 0.0500 1.0700 7.8300 8.7600 baby-ubun Linux 3.16.0- 0.3500 0.0500 1.0700 7.8100 8.7400 virt-ubun Linux 3.16.0- 0.3500 0.0500 1.0700 7.8200 8.7600 virt-ubun Linux 3.16.0- 0.3500 0.0500 1.0700 7.8300 8.7600 virt-ubun Linux 3.16.0- 0.3500 0.0500 1.0700 7.8500 8.7500 Basic float operations - times in nanoseconds - smaller is better ----------------------------------------------------------------- Host OS float float float float add mul div bogo --------- ------------- ------ ------ ------ ------ baby-ubun Linux 3.16.0- 1.0400 1.7300 4.9600 5.0000 baby-ubun Linux 3.16.0- 1.0400 1.7300 4.9600 5.0000 baby-ubun Linux 3.16.0- 1.0400 1.7300 4.9500 5.0000 virt-ubun Linux 3.16.0- 1.0400 1.7300 4.9600 4.9700 virt-ubun Linux 3.16.0- 1.0400 1.7400 4.9700 4.9800 virt-ubun Linux 3.16.0- 1.0400 1.7300 4.9600 4.9900 Basic double operations - times in nanoseconds - smaller is better ------------------------------------------------------------------ Host OS double double double double add mul div bogo --------- ------------- ------ ------ ------ ------ baby-ubun Linux 3.16.0- 1.0400 1.7300 7.7200 7.6100 baby-ubun Linux 3.16.0- 1.0400 1.7300 7.7200 7.6100 baby-ubun Linux 3.16.0- 1.0400 1.7300 7.7400 7.6100 virt-ubun Linux 3.16.0- 1.0400 1.7300 7.7400 7.6300 virt-ubun Linux 3.16.0- 1.0400 1.7300 7.7400 7.6300 virt-ubun Linux 3.16.0- 1.0400 1.7300 7.7300 7.6200 Context switching - times in microseconds - smaller is better ------------------------------------------------------------------------- Host OS 2p/0K 2p/16K 2p/64K 8p/16K 8p/64K 16p/16K 16p/64K ctxsw ctxsw ctxsw ctxsw ctxsw ctxsw ctxsw --------- ------------- ------ ------ ------ ------ ------ ------- ------- baby-ubun Linux 3.16.0- 1.2700 1.1700 1.2900 1.6000 1.9200 1.75000 2.06000 baby-ubun Linux 3.16.0- 1.2500 1.2100 1.2700 1.5600 1.9300 1.73000 2.16000 baby-ubun Linux 3.16.0- 1.2800 1.2400 1.2400 1.5800 1.9800 1.72000 2.04000 virt-ubun Linux 3.16.0- 1.2600 1.2000 1.4500 1.6200 2.2200 1.83000 2.42000 virt-ubun Linux 3.16.0- 1.2000 1.2300 1.4800 1.7000 2.1900 1.81000 2.32000 virt-ubun Linux 3.16.0- 1.2900 1.2800 1.7200 1.7900 2.5500 2.06000 2.72000 *Local* Communication latencies in microseconds - smaller is better --------------------------------------------------------------------- Host OS 2p/0K Pipe AF UDP RPC/ TCP RPC/ TCP ctxsw UNIX UDP TCP conn --------- ------------- ----- ----- ---- ----- ----- ----- ----- ---- baby-ubun Linux 3.16.0- 1.270 3.210 4.51 13.2 15.6 28. baby-ubun Linux 3.16.0- 1.250 3.211 4.36 13.0 15.3 27. baby-ubun Linux 3.16.0- 1.280 3.266 4.38 13.2 15.6 27. virt-ubun Linux 3.16.0- 1.260 3.230 4.59 7.849 10.9 18. virt-ubun Linux 3.16.0- 1.200 3.095 4.31 7.716 31.5 33. virt-ubun Linux 3.16.0- 1.290 3.373 4.61 7.964 11.2 32. File & VM system latencies in microseconds - smaller is better ------------------------------------------------------------------------------- Host OS 0K File 10K File Mmap Prot Page 100fd Create Delete Create Delete Latency Fault Fault selct --------- ------------- ------ ------ ------ ------ ------- ----- ------- ----- baby-ubun Linux 3.16.0- 7.3678 5.6366 16.6 9.1331 6604.0 0.234 0.21320 1.222 baby-ubun Linux 3.16.0- 7.0883 5.6481 16.8 9.1536 6637.0 0.239 0.21540 1.221 baby-ubun Linux 3.16.0- 7.0470 5.7672 16.2 8.9707 6625.0 0.248 0.21420 1.223 virt-ubun Linux 3.16.0- 7.1741 5.8160 16.4 9.0638 7085.0 0.297 0.23500 1.224 virt-ubun Linux 3.16.0- 7.0921 5.7670 16.4 9.0966 7162.0 0.296 0.23360 1.225 virt-ubun Linux 3.16.0- 7.1873 5.8700 16.9 9.2117 7485.0 0.258 0.28000 1.227 *Local* Communication bandwidths in MB/s - bigger is better ----------------------------------------------------------------------------- Host OS Pipe AF TCP File Mmap Bcopy Bcopy Mem Mem UNIX reread reread (libc) (hand) read write --------- ------------- ---- ---- ---- ------ ------ ------ ------ ---- ----- baby-ubun Linux 3.16.0- 5196 5689 4410 5486.4 8603.2 4637.3 3200.9 7920 4828. baby-ubun Linux 3.16.0- 5205 5639 4350 5463.1 8593.6 4634.6 3201.2 7917 4827. baby-ubun Linux 3.16.0- 5199 5648 4473 5472.4 8599.0 4636.9 3201.2 7920 4828. virt-ubun Linux 3.16.0- 4810 5518 3499 6370.3 11.0K 3176.1 3144.7 7828 4745. virt-ubun Linux 3.16.0- 4909 3711 6013.2 10.2K 3179.5 3145.5 7795 4739. virt-ubun Linux 3.16.0- 4587 5221 3606 5524.6 8882.7 3153.5 3116.3 7750 4688. Memory latencies in nanoseconds - smaller is better (WARNING - may not be correct, check graphs) ------------------------------------------------------------------------------ Host OS Mhz L1 $ L2 $ Main mem Rand mem Guesses --------- ------------- --- ---- ---- -------- -------- ------- baby-ubun Linux 3.16.0- 2300 1.3830 4.1490 20.9 76.8 baby-ubun Linux 3.16.0- 2300 1.3830 4.1500 21.4 78.7 baby-ubun Linux 3.16.0- 2300 1.3830 4.1490 21.4 77.5 virt-ubun Linux 3.16.0- 2300 1.3850 5.4010 21.4 122.5 virt-ubun Linux 3.16.0- 2300 1.3860 4.3540 22.0 125.5 virt-ubun Linux 3.16.0- 2300 1.3850 4.1840 22.4 125.8
从上面的测试结果可以看出,kvm虚拟化中内存的带宽和延迟,与原生系统相比都比较接近的。所以,可以粗略的得出结论:在硬件提供的内存虚拟化技术(如Intel的EPT)支持下,QEMU/KVM的内存虚拟化性能比较良好,可以达到原生系统95%以上的性能。
磁盘I/O性能测试
采用IOzone工具来进行测试,IOzone可以通过多种文件系统操作(如普通的读写、重读、重写、随机的读写)来衡量一个文件系统的性能。
下载IOzone源代码,解压后进入iozone3_414/src/current目录下运行make linux-AMD64命令即可编译。编译完成后,当前目录就生成了iozone可执行文件。
[email protected]:/home/luyi/iozone3_414/iozone3_414/src/current# ./iozone -s 512m -r 8k -S 2048 -L 64 -I -i 0 -i 1 -i 2 -Rab iozone.xls
在上面的命令参数中,-s 512m表示用于测试的文件大小为512M,-r 8k表示一条记录的大小(一次读写操作的大小)位8kb,-S 2048表示本机的缓存大小是2048kb,-L 64表示缓存线路大小位64字节,-I表示使用直接I/O方式读写绕过也页面缓存,-i 0 -i 1 -i 2表示运行“0=write/rewrite,1=read/re-read,2=random-read/write”这三种测试,-Rab iozone.xls表示运行完整的自动模式进行测试并生成Excel格式的报告iozone.xls。其中-S、-L的值通过如下命令查询得到,这两个值也可以让IOzone自己决定:
[email protected]:/home/luyi/iozone3_414/iozone3_414/src/current# cat /proc/cpuinfo | grep cache cache size : 2048 KB cache_alignment : 64 cache size : 2048 KB cache_alignment : 64
1k(一次读写操作的大小) | Writer Report | Re-writer Report | Reader Report | Re-reader Report | Random Read Report | Random Write Report |
host(物理机) | 1.67m/s | 9.31m/s | 14.02m/s | 13.89m/s | 0.17m/s | 0.26m/s |
virt-none(虚拟机,cache=none) | 1.47m/s | 6.71m/s | 7.37m/s | 7.65m/s | 0.17m/s | 0.25m/s |
virt-default(虚拟机,cache=default(writeback)) | 17.56m/s | 16.15m/s | 17.87m/s | 18.62m/s | 21.17m/s | 2.52m/s |
virt-writethrough(虚拟机,cache=writethrough) | 0.11m/s | 0.11m/s | 21.84m/s | 21.83m/s | 21.65m/s | 0.12m/s |
8k(一次读写操作的大小) | Writer Report | Re-writer Report | Reader Report | Re-reader Report | Random Read Report | Random Write Report |
host | 63.15m/s | 67.02m/s | 71.75m/s | 70.45m/s | 1.10m/s | 1.83m/s |
virt-none | 41.02m/s | 40.75m/s | 43.78m/s | 45.30m/s | 1.01m/s | 1.82m/s |
virt-default | 125.04m/s | 146.71m/s | 161.26m/s | 161.11m/s | 160.65m/s | 16.01m/s |
virt-writethrough | 0.91m/s | 0.91m/s | 164.16m/s | 164.19m/s | 163.01m/s | 0.85m/s |
1m(..) | Writer Report | Re-writer Report | Reader Report | Re-reader Report | Random Read Report | Random Write Report |
host | 98.03m/s | 98.58m/s | 101.67m/s | 101.67m/s | 57.26m/s | 61.18m/s |
virt-none | 95.73m/s | 98.34m/s | 100.54m/s | 100.22m/s | 56.71m/s | 65.30m/s |
virt-default | 168.07m/s | 173.64m/s | 2609.87m/s | 2676.79m/s | 2777.94m/s | 141.64m/s |
virt-writethrough | 52.83m/s | 52.52m/s | 3283.17m/s | 3317.91m/s | 3221.16m/s | 40.71m/s |
8m(..) | Writer Report | Re-writer Report | Reader Report | Re-reader Report | Random Read Report | Random Write Report |
host | 100.30m/s | 100.80m/s | 102.05m/s | 101.88m/s | 92.71m/s | 82.89m/s |
virt-none | 81.90m/s | 86.50m/s | 97.41m/s | 98.81m/s | 90.49m/s | 80.74m/s |
virt-default | 210.12m/s | 171.43m/s | 2691.20m/s | 2700.01m/s | 2682.85m/s | 185.32m/s |
virt-writethrough | 62.02m/s | 64.35m/s | 2546.63m/s | 2624.13m/s | 2663.87m/s | 59.44m/s |
通过设置虚拟磁盘的读写方式以及测试时一次读写操作的大小得到以上数据。
虚拟磁盘的cache_mode选择none,可以绕过页面缓存(页面缓存可以大大提高虚拟磁盘的访问速度,所以当cache=writeback时虚拟磁盘的性能非常不错,但是意外断电,可能会造成数据丢失),如果是要观察虚拟磁盘的性能损耗,可以观察host和virt-none这两组数据。I/O性能的好坏与“一次完成的读写操作的大小”有关,当一次完成的读写操作的大小比较大时(1m或8m),虚拟磁盘的性能与物理磁盘的性能越是接近,当一次完成的读写操作的大小较小时(1k或8k),虚拟磁盘的性能大概是物理磁盘的60%-70%。