近期要测试Intel QLogic QLE7340 40Gb网卡设备及Mellanox ConnectX VPI MT26428 40Gb设备的带宽及延迟。主要测试设备在不同数据包下的TCP、UDP及SDP带宽与延迟。下面介绍如和安装驱动及使用qperf测试工具进行一些基本的测试。
一、安装依赖包
这些安装包位于SUSE11sp2的第二章光盘上,以下是一些依赖包
[[email protected] ~] # zypper install -y libstdc++46-devel [[email protected] ~] # zypper install -y gcc43-fortran [[email protected] ~] # zypper install -y libgfortran46 [[email protected] ~] # zypper install -y binutils-devel
二、下载Intel QLogic InfiniBand驱动并安装
首先到官网下载适合操作系统的驱动程序,然后解压并一键安装。
[[email protected] ~] # unzip QLogicIB-Basic.SLES11-x86_64.7.1.1.0.25.zip [[email protected] ~] # cd QLogicIB-Basic.SLES11-x86_64.7.1.1.0.25 [[email protected] ~] # perl INSTALL QLogic Inc. InfiniBand 7.1.1.0.25 Software 1) Install/Uninstall Software 2) Reconfigure OFED IP over IB 3) Reconfigure Driver Autostart 4) Update HCA Firmware 5) Generate Supporting Information for Problem Report 6) Fast Fabric (Host/Chassis/Switch Setup/Admin) X) Exit 根据提示安装即可,我这里把能安装都安装上去了,免得麻烦。安装过程省略。。。
三、启动openibd与opensmd服务
驱动安装完毕,需要启动openibd服务。另外,还需要启动一个opensmd的服务。Open SM是Subnet Manager的简称,在一个子网内,一定要在管理节点启动opensmd服务。由于本次测试是使用两台服务器,而且两台服务器使用直连的方式来使用InfiniBand网卡的,所以在其中一个节点启动opensmd,并作为这两台机器的管理节点。在安装InfiniBand网卡驱动时,已安装了Open SM软件包,并设置了opensmd的开机自启动,以下是截图:
Installing OFED Open SM 1.5.4.1.44 release... installing opensm-3.3.13-1.x86_64... opensmd 0:off 1:off 2:on 3:on 4:off 5:on 6:off
在两个节点上都要启动openibd服务,而opensmd只需要在一台机器上启动即可。
四、查看设备状态
安装完毕驱动程序,接下来查看以下设备的状态是怎样的?安装驱动的时候,会安装很多工具命令,方便我们对IB设备进行查看或配置,大多命令都是以ib开头的。
- 查看IB设备状态
[[email protected] ~] # ibstat CA ‘mlx4_0‘ CA type: MT26428 Number of ports: 2 Firmware version: 2.8.0 Hardware version: b0 Node GUID: 0x0002c9030009b89a System image GUID: 0x0002c9030009b89d Port 1: State: Active Physical state: LinkUp Rate: 40 Base lid: 2 LMC: 0 SM lid: 1 Capability mask: 0x02510868 Port GUID: 0x0002c9030009b89b Link layer: InfiniBand Port 2: State: Down Physical state: PortConfigurationTraining Rate: 10 Base lid: 0 LMC: 0 SM lid: 0 Capability mask: 0x02510868 Port GUID: 0x0002c9030009b89c Link layer: InfiniBand
查看IB端口状态
[[email protected] ~] # iba_showports mlx4_0/Port1 PortState: Active PhysState: LinkUp DownDefault: Polling LID: 0x0002 LMC: 0 Subnet: 0xfe80000000000000 GUID: 0x0002c9030009b89b GUID Cap: 128 SMLID: 0x0001 SMSL: 0 RespTimeout: 268 ms SubnetTimeout: 1 s LinkWidth: Active: 4x Supported: 1-4x Enabled: 1-4x LinkSpeed: Active: 10.0Gb Supported: 2.5-10Gb Enabled: 2.5-10Gb Symbol Errors 0 mlx4_0/Port2 PortState: Down PhysState: Training DownDefault: Polling LID: 0x0000 LMC: 0 Subnet: 0xfe80000000000000 GUID: 0x0002c9030009b89c GUID Cap: 128 SMLID: 0x0000 SMSL: 0 RespTimeout: 268 ms SubnetTimeout: 4 us LinkWidth: Active: 4x Supported: 1-4x Enabled: 4x LinkSpeed: Active: 2.5-5Gb Supported: 2.5-10Gb Enabled: 2.5-10Gb Symbol Errors 65535
查看IB节点或主机
[[email protected] ~] # ibhosts Ca : 0x0002c90300077146 ports 2 "ssd2 HCA-1" <-ssd2的GUID Ca : 0x0002c9030009b89a ports 2 "ssd1 HCA-1" <-ssd1的GUID
查看设备的GUID
[[email protected] ~] # ibstat -p 0x0002c9030009b89a [[email protected] ~] # ibv_devices device node GUID ------ ---------------- mlx4_0 0002c9030009b89a [[email protected] ~] # ibstat -p 0x0002c90300077146
通过GUID测试连通性
可以使用ibping命令进行两个节点的连通性。ibping命令是一个服务端与客户端的程序,需要在服务端使用-S选项来运行。
[[email protected] ~] # ibping -S & [1] 14960 [[email protected] ~] # ibping -G 0x0002c90300077146 Pong from ssd1 (Lid 2): time 0.184 ms Pong from ssd1 (Lid 2): time 0.252 ms Pong from ssd1 (Lid 2): time 0.235 ms Pong from ssd1 (Lid 2): time 0.255 ms Pong from ssd1 (Lid 2): time 0.303 ms Pong from ssd1 (Lid 2): time 0.248 ms Pong from ssd1 (Lid 2): time 0.244 ms Pong from ssd1 (Lid 2): time 0.209 ms Pong from ssd1 (Lid 2): time 0.251 ms Pong from ssd1 (Lid 2): time 0.244 ms Pong from ssd1 (Lid 2): time 0.265 ms
通过IP地址测试连通性
配置端口的IP地址,可以使用iba_config工具进行端口的IP配置,这里就贴出ifcfg-ib0的配置文件,仅供参考。
[[email protected] ~] # cat /etc/sysconfig/network/ifcfg-ib0 DEVICE=‘ib0‘ BOOTPROTO=‘static‘ IPADDR=‘11.11.11.39‘ NETMASK=‘255.255.255.0‘ NETWORK=‘11.11.11.0‘ BROADCAST=‘11.11.11.255‘ STARTMODE=‘onboot‘ [[email protected] ~] # cat /etc/sysconfig/network/ifcfg-ib0 DEVICE=‘ib0‘ BOOTPROTO=‘static‘ IPADDR=‘11.11.11.40‘ NETMASK=‘255.255.255.0‘ NETWORK=‘11.11.11.0‘ BROADCAST=‘11.11.11.255‘ STARTMODE=‘onboot‘
配置完毕,在两个节点重启openibd服务。
[[email protected] ~] # service openibd restart [[email protected] ~] # service openibd restart 如果提示需要先停止opensmd服务,则按照提示,先停止opensmd服务,然后重启openibd服务 ,最后重启openibd服务即可。之后,我们可以使用ifconfig命令查看已配置的IP地址。 [[email protected] ~] # ifconfig ib0 ib0 Link encap:InfiniBand HWaddr 80:00:00:48:FE:80:00:00:00:00:00:00:00:00:00:00:00:00:00:00 inet addr:11.11.11.39 Bcast:11.11.11.255 Mask:255.255.255.0 inet6 addr: fe80::202:c903:9:b89b/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:65520 Metric:1 RX packets:112 errors:0 dropped:0 overruns:0 frame:0 TX packets:156 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:256 RX bytes:9497 (9.2 Kb) TX bytes:11333 (11.0 Kb) [[email protected] ~] # ifconfig ib0 ib0 Link encap:InfiniBand HWaddr 80:00:00:49:FE:80:00:00:00:00:00:00:00:00:00:00:00:00:00:00 inet addr:11.11.11.40 Bcast:11.11.11.255 Mask:255.255.255.0 UP BROADCAST RUNNING MULTICAST MTU:65520 Metric:1 RX packets:0 errors:0 dropped:0 overruns:0 frame:0 TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:256 RX bytes:0 (0.0 b) TX bytes:0 (0.0 b)
五、带宽及延迟测试
带宽及延迟测试,我们使用qperf工具进行统一测试。在测试过程中,一台机器作为qperf服务端,另外一台作为qperf客户端。下面操作在qperf服务端进行,操作如下:
[[email protected] ~]# hostname –i 172.16.25.39 <-本地以太网IP地址 [[email protected] ~]# qperf –listen_port 9306 &
1. 本地以太网TCP带宽及延迟测试
本次测试中,使用了5种不同的数据包进行了TCP带宽及延迟的测试。测试操作是在qperf的客户端进行的。操作如下:
[[email protected] ~]# hostname –i 172.16.25.40 [[email protected] ~]# qperf –listen_port 9306 –H 172.16.25.39 –time 300 –use_bits_per_sec –precision 2 –verbose_more –msg_size 512k tcp_bw tcp_lat conf tcp_bw: bw = 947 Mb/sec msg_rate = 231 /sec msg_size = 500 KiB (512,000) time = 300 sec timeout = 5 sec send_cost = 1.1 sec/GB recv_cost = 1.1 sec/GB send_cpus_used = 13 % cpus send_cpus_user = 0.5 % cpus send_cpus_intr = 2.6 % cpus send_cpus_kernel = 9.5 % cpus send_cpus_iowait = 0.8 % cpus send_real_time = 300 sec send_cpu_time = 40 sec send_bytes = 35 GB send_msgs = 69,331 recv_cpus_used = 13 % cpus recv_cpus_user = 0.2 % cpus recv_cpus_intr = 1.1 % cpus recv_cpus_kernel = 11 % cpus recv_cpus_iowait = 0.3 % cpus recv_real_time = 300 sec recv_cpu_time = 39 sec recv_bytes = 35 GB recv_msgs = 69,327 tcp_lat: latency = 4.5 ms msg_rate = 223 /sec msg_size = 500 KiB (512,000) time = 300 sec timeout = 5 sec loc_cpus_used = 19 % cpus loc_cpus_user = 0.7 % cpus loc_cpus_intr = 1.5 % cpus loc_cpus_kernel = 16 % cpus loc_cpus_iowait = 0.7 % cpus loc_real_time = 300 sec loc_cpu_time = 58 sec loc_send_bytes = 17 GB loc_recv_bytes = 17 GB loc_send_msgs = 33,433 loc_recv_msgs = 33,432 rem_cpus_used = 17 % cpus rem_cpus_user = 0.1 % cpus rem_cpus_intr = 1 % cpus rem_cpus_kernel = 16 % cpus rem_cpus_iowait = 0.3 % cpus rem_real_time = 300 sec rem_cpu_time = 52 sec rem_send_bytes = 17 GB rem_recv_bytes = 17 GB rem_send_msgs = 33,432 rem_recv_msgs = 33,432 conf: loc_node = ssd2 loc_cpu = 24 Cores: Intel Xeon E5-2620 0 @ 2.00GHz loc_os = Linux 3.0.13-0.27-default loc_qperf = 0.4.6 rem_node = ssd1 rem_cpu = 24 Cores: Intel Xeon E5-2620 0 @ 2.00GHz rem_os = Linux 3.0.13-0.27-default rem_qperf = 0.4.6
其中,512k是我们指定的数据块大小。如果我们要测试其他尺寸的数据包,可以改变该值,测试时间为300s也是可以更改的。另外,也可以批量的测试一定范围内的数据包,操作如下:
A0305010:~ # qperf –listen_port 9306 –H 172.16.25.39 –time 300 > –use_bits_per_sec –precision 2 –verbose_more –loop > msg_size:64k:64m:*2 tcp_bw tcp_lat conf
其中,64k:64m:*2代表从64k的数据包开始测,到32m的数据包结束,数据包以2倍的关系递增。
2. UDP带宽及延迟测试
[[email protected] ~]# qperf –listen_port 9306 –H 172.16.25.39 –time 300 > –use_bits_per_sec –precision 2 –verbose_more > –msg_size 4k udp_bw udp_lat conf udp_bw: send_bw = 11 Gb/sec recv_bw = 901 Mb/sec msg_rate = 28 K/sec msg_size = 4 KB time = 300 sec timeout = 10 sec send_cost = 746 ms/GB recv_cost = 833 ms/GB send_cpus_used = 102 % cpus send_cpus_user = 2.1 % cpus send_cpus_intr = 0.6 % cpus send_cpus_kernel = 99 % cpus send_cpus_iowait = 0.7 % cpus send_real_time = 300 sec send_cpu_time = 306 sec send_bytes = 410 GB send_msgs = 103 million recv_cpus_used = 9.4 % cpus recv_cpus_user = 0.3 % cpus recv_cpus_intr = 0.3 % cpus recv_cpus_kernel = 8.6 % cpus recv_cpus_iowait = 0.2 % cpus recv_real_time = 300 sec recv_cpu_time = 28 sec recv_bytes = 34 GB recv_msgs = 8.5 million udp_lat: latency = 310 us msg_rate = 3.2 K/sec msg_size = 4 KB time = 300 sec timeout = 10 sec loc_cpus_used = 4.1 % cpus loc_cpus_user = 0.6 % cpus loc_cpus_intr = 0.3 % cpus loc_cpus_kernel = 2.8 % cpus loc_cpus_iowait = 0.5 % cpus loc_real_time = 300 sec loc_cpu_time = 12 sec loc_send_bytes = 1.9 GB loc_recv_bytes = 1.9 GB loc_send_msgs = 484,381 loc_recv_msgs = 484,380 rem_cpus_used = 3.3 % cpus rem_cpus_user = 0.1 % cpus rem_cpus_intr = 0.1 % cpus rem_cpus_kernel = 2.8 % cpus rem_cpus_iowait = 0.3 % cpus rem_real_time = 300 sec rem_cpu_time = 10 sec rem_send_bytes = 1.9 GB rem_recv_bytes = 1.9 GB rem_send_msgs = 484,380 rem_recv_msgs = 484,380 conf: loc_node = ssd2 loc_cpu = 24 Cores: Intel Xeon E5-2620 0 @ 2.00GHz loc_os = Linux 3.0.13-0.27-default loc_qperf = 0.4.6 rem_node = ssd1 rem_cpu = 24 Cores: Intel Xeon E5-2620 0 @ 2.00GHz rem_os = Linux 3.0.13-0.27-default rem_qperf = 0.4.6
3. SDP带宽及延迟测试
[[email protected] ~]# qperf –listen_port 9306 –H 11.11.11.39 –time 300 > –use_bits_per_sec –precision 2 –verbose_more –msg_size 512k > sdp_bw sdp_lat conf sdp_bw: bw = 15 Gb/sec msg_rate = 3.7 K/sec msg_size = 500 KiB (512,000) time = 300 sec timeout = 10 sec send_cost = 75 ms/GB recv_cost = 228 ms/GB send_cpus_used = 14 % cpus send_cpus_user = 0.6 % cpus send_cpus_intr = 0 % cpus send_cpus_kernel = 14 % cpus send_cpus_iowait = 0.1 % cpus send_real_time = 300 sec send_cpu_time = 43 sec send_bytes = 572 GB send_msgs = 1.1 million recv_cpus_used = 44 % cpus recv_cpus_user = 0.2 % cpus recv_cpus_intr = 0.1 % cpus recv_cpus_kernel = 43 % cpus recv_cpus_iowait = 0.3 % cpus recv_real_time = 300 sec recv_cpu_time = 131 sec recv_bytes = 572 GB recv_msgs = 1.1 million sdp_lat: latency = 248 us msg_rate = 4 K/sec msg_size = 500 KiB (512,000) time = 300 sec timeout = 10 sec loc_cpus_used = 29 % cpus loc_cpus_user = 0.8 % cpus loc_cpus_intr = 0.1 % cpus loc_cpus_kernel = 28 % cpus loc_cpus_iowait = 0 % cpus loc_real_time = 300 sec loc_cpu_time = 88 sec loc_send_bytes = 310 GB loc_recv_bytes = 310 GB loc_send_msgs = 604,576 loc_recv_msgs = 604,575 rem_cpus_used = 29 % cpus rem_cpus_user = 0.2 % cpus rem_cpus_intr = 0.1 % cpus rem_cpus_kernel = 29 % cpus rem_cpus_iowait = 0.4 % cpus rem_real_time = 300 sec rem_cpu_time = 88 sec rem_send_bytes = 310 GB rem_recv_bytes = 310 GB rem_send_msgs = 604,575 rem_recv_msgs = 604,575 conf: loc_node = ssd2 loc_cpu = 24 Cores: Intel Xeon E5-2620 0 @ 2.00GHz loc_os = Linux 3.0.13-0.27-default loc_qperf = 0.4.6 rem_node = ssd1 rem_cpu = 24 Cores: Intel Xeon E5-2620 0 @ 2.00GHz rem_os = Linux 3.0.13-0.27-default rem_qperf = 0.4.6 [[email protected] ~]# qperf –listen_port 9306 –H 11.11.11.39 –time 300 > –use_bits_per_sec –precision 2 –verbose_more –loop > msg_size:4k:64k:*2 sdp_bw sdp_lat conf