环境描述:
某国有企业callcenter系统一中心三平台结构,现需要在中心服务器集群前增加check point防火墙等安全设备
问题描述:
在check point防火墙割接上线后,随机抽选部分话务坐席进行测试,信令传输、三方呼叫、通话转移等一切正常,但大部分坐席人员上班后,各种通话故障随机出现,此时信令传输正常。
排查过程:
发现出现以上问题,立即进行排查。
1、首先登陆smartDashboard查看安全策略
从策略状态显示,在割接测试到故障出现之间的时间段,没有人为更改相关策略,策略一切正常。
2、其次查看smartlog系统,在log系统中随机输入故障话机的IP地址
也没有发现drop数据包信息
3、随即使用smartview tracker进一步查看数据包情况,在过滤条件中添加故障话机IP地址:
查看过滤结果
没有发现任何异常信息
4、通过使用命令行在设备接口处进行抓包分析
;[cpu_4];[fw4_1];fw_log_drop_ex: Packetproto=1 219.141.216.254:0
-> 219.141.216.12:0 dropped byfwha_select_ip_packet Reason: icmp
probe reply to our request;
;[cpu_2];[fw4_3];fw_log_drop_ex:
Packetproto=17 0.0.0.0:8116 -> 10.96.165.28:8116 dropped by
fw_handle_first_packetReason: Rulebase drop - rule 629;
;[cpu_2];[fw4_3];fw_log_drop_ex:
Packetproto=17 0.0.0.0:8116 -> 219.141.216.12:8116 dropped
byfw_handle_first_packet Reason: Rulebase drop - rule 629;
;[cpu_2];[fw4_3];fw_log_drop_ex:
Packetproto=17 0.0.0.0:8116 -> 10.96.165.28:8116 dropped by
fw_handle_first_packetReason: Rulebase drop - rule 629;
;[cpu_5];[fw4_0];fw_log_drop_ex:
Packetproto=17 192.168.254.4:137 -> 10.96.21.136:137 dropped
byfw_handle_first_packet Reason: Rulebase drop - rule 629;
;[cpu_2];[fw4_3];fw_log_drop_ex:
Packetproto=17 0.0.0.0:8116 -> 10.96.165.28:8116 dropped by
fw_handle_first_packetReason: Rulebase drop - rule 629;
;[cpu_2];[fw4_3];fw_log_drop_ex:
Packetproto=17 0.0.0.0:8116 -> 219.141.216.12:8116 dropped
byfw_handle_first_packet Reason: Rulebase drop - rule 629;
;[cpu_2];[fw4_3];fw_log_drop_ex:
Packetproto=17 0.0.0.0:8116 -> 10.96.165.28:8116 dropped by
fw_handle_first_packetReason: Rulebase drop - rule 629;
;[cpu_4];[fw4_1];fw_log_drop_ex:
Packetproto=17 172.23.140.36:51221 -> 10.96.4.249:2055 dropped by
fw_handle_first_packetReason: Rulebase drop - rule 629;
;[cpu_4];[fw4_1];fw_log_drop_ex:
Packetproto=1 10.96.165.20:0 -> 10.96.165.28:0 dropped by
fwha_select_ip_packetReason: icmp probe reply to our request;
;[cpu_2];[fw4_3];fw_log_drop_ex:
Packet proto=170.0.0.0:8116 -> 219.141.216.12:8116 dropped by
fw_handle_first_packetReason: Rulebase drop - rule 629;
;[cpu_2];[fw4_3];fw_log_drop_ex:
Packetproto=17 0.0.0.0:8116 -> 10.96.165.28:8116 dropped by
fw_handle_first_packetReason: Rulebase drop - rule 629;
;[cpu_2];[fw4_3];fw_log_drop_ex:
Packetproto=17 0.0.0.0:8116 -> 10.96.165.28:8116 dropped by
fw_handle_first_packetReason: Rulebase drop - rule 629;
;[cpu_2];[fw4_3];fw_log_drop_ex:
Packetproto=17 0.0.0.0:8116 -> 10.96.165.28:8116 dropped by
fw_handle_first_packetReason: Rulebase drop - rule 629;
;[cpu_2];[fw4_3];fw_log_drop_ex:
Packetproto=17 0.0.0.0:8116 -> 219.141.216.12:8116 dropped
byfw_handle_first_packet Reason: Rulebase drop - rule 629;
;[cpu_2];[fw4_3];fw_log_drop_ex:
Packetproto=17 0.0.0.0:8116 -> 10.96.165.28:8116 dropped by
fw_handle_first_packetReason: Rulebase drop - rule 629;
也不存在任何和callcenter IP地址相关的drop信息
5、经过以上几个步骤的检查,初步排除check point防火墙问题,配合网络团队逐台检查此次上线的其它设备,终于在核心交换之间的一台IPS设备上发现异常状态,该IPS设备log显示,在call center话务出现故障的时间段内该设备遭受DDOS攻击,丢弃了大量疑似攻击的UDP数据包。
6、经过和语音团队确认,call center在语音通过过程中,正是使用UDP随机端口进行传输,关闭IPS抗DDOS功能之后,话务坐席故障消失,通话回复正常。
原因分析:
原来call center系统在工作时,首先在话机和服务器之间传输语音信令,之后话务坐席之间直接通讯,用的是UDP1023-65535随机端口,在割接测试时,同时工作的话机比较少,此时传输正常。当三地坐席人员全部到位后,因为所有的坐席都在同一个网段,UDP数据包在单位时间内超过了IPS预设的DDOS阀值,所以IPS就把来自cc网段的UDP包判断为DDOS攻击丢弃,导致语音数据包不全,出现了随机的语音故障。