在kvm中,客户机可以使用的设备大概可以分为以下三类:
(1)Emulated device:纯软件模拟的设备
(2)virtio device:实现virtio api的半虚拟化驱动的设备
(3)pci device assginment:pci设备直接分配
前面两种类型之前都已经提及过,纯软件模拟的好处是对硬件平台依赖性较低,可以模拟一些较好的设备,不需要客户机额外支持,virtio提高了i/o效率,但是缺点是需要客户机支持。
简单来说,纯软件模拟好比一个病人(客户机)摔断了脚,但是他自己不知道自己不能走,所以按以前的思维走路,所以家人(宿主机)每次都要监控一下他什么时候打算走,给他安上假肢。而virtio,则是病人意识到自己不能再向以前那样走了,于是每次要走的时候都按照家人给它的方法走(这里应该类似于你把脚放在那我就知道啦),家人得知后为它安上假肢。化被动为主动,所以效率大大提高。而第三种则更6,直接给病人装上真肢..
第三种方式一般叫做pci设备直接分配(Device Assignment或PCI pass-through),它允许将宿主机中的物理pci设备直接分配给客户机完全使用,它需要硬件平台的支持,在indel中的技术规范为intel virtualization technology for Directed i/o即VT-d,在AMD则为AMD-Vi。它的优点显而易见,i/o效率跟原生系统几乎一样,缺点是硬件带来的成本增加。开启这项功能需要中断重映射(interrupt remapping)的支持
vt-d环境配置:
(1)在bios中开启vt-d功能,前提是硬件支持,过程略。
(2)宿主机内核的支持,以下参数有支持即可(我的内核是2.6.32-431.el6.x86_64,每个版本内核参数不一样)
CONFIG_DMAR=y # DMA addresses remapping engine
# CONFIG_DMAR_DEFAULT_ON is not set
CONFIG_DMAR_FLOPPY_WA=y
CONFIG_INTR_REMAP=y
CONFIG_PCI_STUB=y #隐藏设备功能
可以利用dmesg |grep -iE “DMAR|IOMMU”查看输出是否已开启vt-d功能,如果内核的iommu默认没有开,可以在grub的kernel行中加入intel_iommu=on这个内核启动项
(3)在宿主机中利用pci-stub驱动隐藏设备
modprobe pci_stub #加载pci-stub模块,如果编译进内核可以省去这步
ls /sys/bus/pci/drivers/pci-stub/ #查看是否已支持pci-stub,有这个目录即代表支持
lspci -Dnn #查看pci设备有哪些
0000:02:00.0 Ethernet controller [0200]: Broadcom Corporation NetXtreme II BCM5709 Gigabit Ethernet [14e4:1639] (rev 20)
#这里只截取一条记录,0000:02:00.0表示设备在pci/pci-e总线中的具体位置,0000表示域,02:00.0简称BDF(即bus:device:function),而14e4:1639则是vendor ID和device ID
lspci -k -s 02:00.0 #查看设备的内核驱动
02:00.0 Ethernet controller: Broadcom Corporation NetXtreme II BCM5709 Gigabit Ethernet (rev 20)
Subsystem: Dell PowerEdge R610 BCM5709 Gigabit Ethernet
Kernel driver in use: bnx2
Kernel modules: bnx2
echo -n "14e4 1639">/sys/bus/pci/drivers/pci-stub/new_id #注意"14e4 1639"这里是空格,告诉内核什么类型的设备要使用pci-stub驱动
echo "0000:02:00.0" > /sys/bus/pci/devices/0000\:02\:00.0/driver/unbind #把02:00.0这个网卡在原来的驱动上解绑
echo 0000:02:00.0 > /sys/bus/pci/drivers/pci-stub/bind #把网卡设备加入到pci-stub驱动
lspci -k -s 02:00.0
02:00.0 Ethernet controller: Broadcom Corporation NetXtreme II BCM5709 Gigabit Ethernet (rev 20)
Subsystem: Dell PowerEdge R610 BCM5709 Gigabit Ethernet
Kernel driver in use: pci-stub
Kernel modules: bnx2
#可以发现,内核驱动已经变为pci-stub
同理,要绑定回来则用:
echo -n "14e4 1639">/sys/bus/pci/drivers/bnx2/new_id
echo 0000:02:00.0 > /sys/bus/pci/drivers/pci-stub/unbind
echo 0000:02:00.0 > /sys/bus/pci/drivers/bnx2/bind
当然手动写比较蛋疼,容易出错,可以自己写脚本进行操作。
自己写的脚本(可以执行,但还有一些bug没处理,仅供参考):
[[email protected] ~]# cat pcistub.sh
#!/bin/bash function get_info() { read -p "please input your device type: " TYPE while [ `echo $TYPE|wc -m` -ne 0 ];do if [ `lspci -Dnn|grep -i "$TYPE"|wc -l` -eq 0 ];then read -p "no found,please retype : " TYPE else # echo -e "you can choice: \n `lspci -Dnn|grep -i "$TYPE"|cat -n`" read -p "you can choice: `lspci -Dnn|grep -i "$TYPE"|cat -n` your choice: " num until grep ‘^[[:digit:]]*$‘ <<< "$num"&>/dev/null;do read -p "choice must be a number,please retype: " num done break fi done echo $num echo "`lspci -Dnn|grep -i "$TYPE"|head -n $num|tail -1`" >/tmp/device.txt } function choose_unhide_pci(){ [ `ls /sys/bus/pci/drivers/pci-stub/ |grep :|wc -l` -eq 0 ] && exit 0 read -p "you can choice: ` ls /sys/bus/pci/drivers/pci-stub/ |grep :|cat -n` your choice: " num until grep ‘^[[:digit:]]*$‘ <<< "$num"&>/dev/null;do read -p "choice must be a number,please retype: " num done echo $num } function hide_pci() { device=`cat /tmp/device.txt` BDF=`echo $device|awk ‘{print $1}‘` vendID=`echo $device|awk -F ‘[‘ ‘{print $3}‘|awk -F : ‘{print $1}‘` deviceID=`echo $device|awk -F ‘:‘ ‘{print $5}‘|awk -F ‘]‘ ‘{print $1}‘` driver=`lspci -k -s $BDF|grep driver|awk -F : ‘{print $2}‘` ! grep $BDF /tmp/pci_stub.txt &>/dev/null && echo "$BDF $driver">>/tmp/pci_stub.txt echo -n "$vendID $deviceID"> /sys/bus/pci/drivers/pci-stub/new_id echo $BDF >/sys/bus/pci/devices/$BDF/driver/unbind echo $BDF > /sys/bus/pci/drivers/pci-stub/bind } function unhide_pci(){ BDF=`ls /sys/bus/pci/drivers/pci-stub/ |grep :|head -n $1|tail -1 ` vendID=`lspci -Dnn|grep ^$BDF|awk -F ‘[‘ ‘{print $3}‘|awk -F : ‘{print $1}‘` deviceID=`lspci -Dnn|grep ^$BDF|awk -F ‘:‘ ‘{print $5}‘|awk -F ‘]‘ ‘{print $1}‘` driver=`cat /tmp/pci_stub.txt|grep $BDF|awk ‘{print $2}‘` [ -z "$driver" ] && driver="bnx2" echo -n "$vendID $deviceID">/sys/bus/pci/drivers/$driver/new_id echo $BDF > /sys/bus/pci/drivers/pci-stub/unbind echo $BDF > /sys/bus/pci/drivers/$driver/bind } echo -e "you can do :\n 1.hide device 2.unhide device" read -p "your choice: " num case $num in 1) n=$(get_info) [ $n -gt 0 ] &>/dev/null && hide_pci $n ;; 2) n=$(choose_unhide_pci) [ $n -gt 0 ] &>/dev/null && unhide_pci $n ;; *) echo "your input error" esac
(4)通过qemu 命令分配设备给客户机
qemu-kvm命令中通过 -device driver[,pro[=value][,...]] 参数为客户机分配一个设备,?可以查看对应帮助
qemu-system-x86_64 -device ? &> 1.txt #qemu的输出比较特别,直接用grep过滤不到,这里我先把所有输出保存在一个文档里
[[email protected] ~]# cat 1.txt |grep pci-
name "pci-ohci", bus PCI, desc "Apple USB Controller"
name "kvm-pci-assign", bus PCI, alias "pci-assign", desc "KVM-based PCI passthrough"
name "pci-bridge", bus PCI, desc "Standard PCI Bridge"
#可以看到跟pci-assign是被支持的
[[email protected] ~]# qemu-system-x86_64 -device pci-assign,?
kvm-pci-assign.host=pci-host-devaddr
kvm-pci-assign.prefer_msi=on/off
kvm-pci-assign.share_intx=on/off
kvm-pci-assign.bootindex=int32
kvm-pci-assign.configfd=string
kvm-pci-assign.addr=pci-devfn
kvm-pci-assign.romfile=string
kvm-pci-assign.rombar=uint32
kvm-pci-assign.multifunction=on/off
kvm-pci-assign.command_serr_enable=on/off
#这条命令可以查看pci-assign后面还可以跟什么参数,注意:?要放在,后面 不能直接 -device pci-assign ?会报:qemu-system-x86_64: -device pci-assign: could not open disk image ?: No such file or directory 的错误,这里我们使用host参数指定设备对应的BDF来分配设备给客户机,命令如下:
qemu-system-x86_64 -m 2048 -smp 3 xp.qcow2 -device pci-assign,host=02:00.0,id=mydev0,addr=0x6 -usb -usbdevice tablet
如果提示qemu-system-x86_64: -device pci-assign,host=02:00.0,id=mydev0,addr=0x6: No IOMMU found. Unable to assign device "mydev0" 这个错,检查下硬件是否支持vt-d,支持的话查看内核参数是否支持,仍然支持的话查看是否开启,intel 开启命令是在grub.conf kernel那行添加intel_iommu=on,AMD 则需要添加“iommu=pt iommu=1” (参考:http://blog.csdn.net/cybertan/article/details/6596556)
有开启iommu的日志输出类似如下:
[[email protected] kvm_vhost]# dmesg | grep -e DMAR -e IOMMU
ACPI: DMAR 00000000cf3b3668 001A8 (v01 DELL PE_SC3 00000001 DELL 00000001)
Intel-IOMMU: enabled
dmar: IOMMU 0: reg_base_addr fed90000 ver 1:0 cap c90780106f0462 ecap f020fe
IOMMU 0xfed90000: using Queued invalidation
IOMMU: Setting RMRR:
IOMMU: Setting identity map for device 0000:00:1d.7 [0xcf4c2000 - 0xcf4c3000]
IOMMU: Setting identity map for device 0000:00:1a.7 [0xcf4c0000 - 0xcf4c1000]
IOMMU: Setting identity map for device 0000:00:1d.1 [0xcf4a7000 - 0xcf4a8000]
IOMMU: Setting identity map for device 0000:00:1d.0 [0xcf4a5000 - 0xcf4a6000]