2017-4-21 Shell+Python对抓包文件后的文本处理过程

这几天毕设的事情,需要把Modbus数据包变成十六进制形式,但是wireshark不是非常给力,也可能是我还没找到窍门吧。这几天的文本处理把我整的够惨,有些问题以前从来没想过,遇到了真是让人觉得书到用时方恨少呀。做下笔记,以后用的着。

一、目录结构解析

[ [email protected] #] ls /tmp

1.txt   10_BCD.sh   7.sh    get_final.py    README

(1)[ [email protected] #] cat 1.txt  ##其中1.txt是原始抓包文件,

No.     Time           Source                Destination           Protocol Length Info
    246 166.994531     192.168.1.100         192.168.1.101         Modbus/TCP 66        Query: Trans:     0; Unit:   1, Func:   3: Read Holding Registers

Frame 246: 66 bytes on wire (528 bits), 66 bytes captured (528 bits) on interface 0
Ethernet II, Src: HonHaiPr_65:5d:39 (1c:3e:84:65:5d:39), Dst: AskeyCom_1c:52:1e (e0:ca:94:1c:52:1e)
    Destination: AskeyCom_1c:52:1e (e0:ca:94:1c:52:1e)
        Address: AskeyCom_1c:52:1e (e0:ca:94:1c:52:1e)
        .... ..0. .... .... .... .... = LG bit: Globally unique address (factory default)
        .... ...0 .... .... .... .... = IG bit: Individual address (unicast)
    Source: HonHaiPr_65:5d:39 (1c:3e:84:65:5d:39)
        Address: HonHaiPr_65:5d:39 (1c:3e:84:65:5d:39)
        .... ..0. .... .... .... .... = LG bit: Globally unique address (factory default)
        .... ...0 .... .... .... .... = IG bit: Individual address (unicast)
    Type: IPv4 (0x0800)
Internet Protocol Version 4, Src: 192.168.1.100, Dst: 192.168.1.101
    0100 .... = Version: 4
    .... 0101 = Header Length: 20 bytes (5)
    Differentiated Services Field: 0x00 (DSCP: CS0, ECN: Not-ECT)
    Total Length: 52
    Identification: 0x6971 (26993)
    Flags: 0x02 (Don‘t Fragment)
    Fragment offset: 0
    Time to live: 128
    Protocol: TCP (6)
    Header checksum: 0x0d39 [validation disabled]
    [Header checksum status: Unverified]
    Source: 192.168.1.100
    Destination: 192.168.1.101
    [Source GeoIP: Unknown]
    [Destination GeoIP: Unknown]
Transmission Control Protocol, Src Port: 58708, Dst Port: 502, Seq: 1, Ack: 1, Len: 12
    Source Port: 58708
    Destination Port: 502
    [Stream index: 1]
    [TCP Segment Len: 12]
    Sequence number: 1    (relative sequence number)
    [Next sequence number: 13    (relative sequence number)]
    Acknowledgment number: 1    (relative ack number)
    Header Length: 20 bytes
    Flags: 0x018 (PSH, ACK)
    Window size value: 16425
    [Calculated window size: 65700]
    [Window size scaling factor: 4]
    Checksum: 0xb0f0 [unverified]
    [Checksum Status: Unverified]
    Urgent pointer: 0
    [SEQ/ACK analysis]
    [PDU Size: 12]
Modbus/TCP
    Transaction Identifier: 0
    Protocol Identifier: 0
    Length: 6
    Unit Identifier: 1
Modbus
    .000 0011 = Function Code: Read Holding Registers (3)
    Reference Number: 0
    Word Count: 10

No.     Time           Source                Destination           Protocol Length Info
    247 167.015547     192.168.1.101         192.168.1.100         Modbus/TCP 83     Response: Trans:     0; Unit:   1, Func:   3: Read Holding Registers

Frame 247: 83 bytes on wire (664 bits), 83 bytes captured (664 bits) on interface 0
Ethernet II, Src: AskeyCom_1c:52:1e (e0:ca:94:1c:52:1e), Dst: HonHaiPr_65:5d:39 (1c:3e:84:65:5d:39)
    Destination: HonHaiPr_65:5d:39 (1c:3e:84:65:5d:39)
        Address: HonHaiPr_65:5d:39 (1c:3e:84:65:5d:39)
        .... ..0. .... .... .... .... = LG bit: Globally unique address (factory default)
        .... ...0 .... .... .... .... = IG bit: Individual address (unicast)
    Source: AskeyCom_1c:52:1e (e0:ca:94:1c:52:1e)
        Address: AskeyCom_1c:52:1e (e0:ca:94:1c:52:1e)
        .... ..0. .... .... .... .... = LG bit: Globally unique address (factory default)
        .... ...0 .... .... .... .... = IG bit: Individual address (unicast)
    Type: IPv4 (0x0800)
Internet Protocol Version 4, Src: 192.168.1.101, Dst: 192.168.1.100
    0100 .... = Version: 4
    .... 0101 = Header Length: 20 bytes (5)
    Differentiated Services Field: 0x00 (DSCP: CS0, ECN: Not-ECT)
    Total Length: 69
    Identification: 0x1d8e (7566)
    Flags: 0x02 (Don‘t Fragment)
    Fragment offset: 0
    Time to live: 64
    Protocol: TCP (6)
    Header checksum: 0x990b [validation disabled]
    [Header checksum status: Unverified]
    Source: 192.168.1.101
    Destination: 192.168.1.100
    [Source GeoIP: Unknown]
    [Destination GeoIP: Unknown]
Transmission Control Protocol, Src Port: 502, Dst Port: 58708, Seq: 1, Ack: 13, Len: 29
    Source Port: 502
    Destination Port: 58708
    [Stream index: 1]
    [TCP Segment Len: 29]
    Sequence number: 1    (relative sequence number)
    [Next sequence number: 30    (relative sequence number)]
    Acknowledgment number: 13    (relative ack number)
    Header Length: 20 bytes
    Flags: 0x018 (PSH, ACK)
    Window size value: 256
    [Calculated window size: 65536]
    [Window size scaling factor: 256]
    Checksum: 0xdaf5 [unverified]
    [Checksum Status: Unverified]
    Urgent pointer: 0
    [SEQ/ACK analysis]
    [PDU Size: 29]
Modbus/TCP
    Transaction Identifier: 0
    Protocol Identifier: 0
    Length: 23
    Unit Identifier: 1
Modbus
    .000 0011 = Function Code: Read Holding Registers (3)
    [Request Frame: 246]
    Byte Count: 20
    Register 0 (UINT16): 0
    Register 1 (UINT16): 0
    Register 2 (UINT16): 0
    Register 3 (UINT16): 1
    Register 4 (UINT16): 0
    Register 5 (UINT16): 0
    Register 6 (UINT16): 0
    Register 7 (UINT16): 0
    Register 8 (UINT16): 0
    Register 9 (UINT16): 0

 

(2)[ [email protected] #] cat 10_BCD.sh

#!/bin/bash

if [ ! -d test ];then
        mkdir test
fi

grep -iA57 "Modbus/TCP 66 " *.txt |grep -iA8 "^Modbus/TCP" >test/b.txt
cd test
yum install dos2unix -y --quiet   ##windows文件放在linux下有个^M字符编码问题,下个dos2unix即可解决
dos2unix b.txt 

cat b.txt |grep "Transaction" |awk -F ":" ‘{print $2}‘|sed ‘s/^[ \t]*//g‘> 111
cat b.txt |grep "Prot" |awk -F ":" ‘{print $2}‘|sed ‘s/^[ \t]*//g‘> 222
cat b.txt |grep "Leng" |awk -F ":" ‘{print $2}‘|sed ‘s/^[ \t]*//g‘> 333
cat b.txt |grep "Unit Identifier" |awk -F ":" ‘{print $2}‘|sed ‘s/^[ \t]*//g‘> 444
cat b.txt |grep "Function"|grep "Register" |awk -F ":" ‘{print $2}‘|awk -F "(" ‘{print $2}‘|awk -F ")" ‘{print $1}‘> 555
cat b.txt |grep "Refe" |awk -F ":" ‘{print $2}‘|sed ‘s/^[ \t]*//g‘> 666
cat b.txt |grep "Word"|awk -F ":" ‘{print $2}‘|sed ‘s/^[ \t]*//g‘> 777

if [ $? -eq 0 ];then
    paste -d "," 111 222 333 444 555 666 777 > c.txt
    sed -i ‘/,,/d‘ c.txt
    line_number=`cat c.txt | awk -F "," ‘{if ($NF==NULL)print NR}‘ `  ##删除最后一个字符是空的行
    arr=($line_number)   ##把字符串转换为数组,arr默认是arr[0]数组第一个元素的意思
    sed -i $arr‘,$d‘ c.txt  ##sed命令在shell中太被动了,这个命令害惨我了
    cd ..
    echo "====十进制结果都在test目录下的c.txt文件中=====!"
fi 

(3)[ [email protected]  # ]  cat get_final.py

#!/usr/bin/env python
# -*- coding: utf-8 -*
import os
import commands

commands.getoutput(" /bin/bash 10_BCD.sh >&/dev/null ")

def num_bcd(num):    ##十进制转16进制,取四位!
    a = hex(num)## 25转换为0x19
        if num > 16:
                a = a[:1]+‘0‘+a[2:4]  ##0x19转换为0019
                a = a[:2]+‘,‘+a[2:4]+‘,‘  ##0019转换为00,19

        else: ##比如如果是10,就不好办了
                a = a[:1]+‘0,0‘+a[2]+‘,‘
        return a

def fun2(num): ##取两位二进制,比如10转换为0a而不是00,0a
    a = hex(num)
    if num > 16:
        a = a[2:4] + ‘,‘   ##字符串切片
    else:
        a = a[:1]+a[2] + ‘,‘
    return a

f = open(‘test/c.txt‘)
contents = []
for line in f.readlines():
    b = line.split(",")  ##line由字符串变成了列表
    for i in range(len(b)):
        if b[i] == " ":  ##如果是空的,认为数据帧是不完整的
            break
        else:
            b[i] = int(b[i])
            var1 = " "
            if i == 3 or i == 4: ##保证数据帧第4个和第5个数字只留2位
                var1 = fun2(b[i])
                contents.append(var1)
            else:
                var1 = num_bcd(b[i])
                contents.append(var1)
f.close()

filename = ‘new.ini‘
fobj = open(filename, ‘w‘)
fobj.writelines([‘%s%s‘ % (eachline, os.linesep) for eachline in contents])  ##新的内容放在列表中
fobj.close()
commands.getoutput(" /bin/bash 7.sh >& /dev/null ")
print "结果在final.txt文件中!"

(4)[ [email protected]  # ]  cat 7.sh

#!/bin/bash

cat new.ini | awk -F "," ‘{if (NR%7!=0)ORS=" ";else ORS="\n";print}‘ >final_Result
if [ -f new.ini ];then
    rm -f new.ini
fi

(5)[ [email protected]  # ]  cat README

===================操作指南============================
.txt的文件都是是初始抓包文件!

Note: 只需要执行python get_final.py即可,数据帧结果保存在final_result文件中

过程描述:
1、执行python get_final.py的时候,首先调用10_BCD.sh,把原始抓包文件转换为十进制文件,在test目录有7个小文件,最后进行合并,得到b.txt
2、在python主体中,执行从十进制到十六进制的转换,但是每7列的十六进制形式是分散的
3、最后调用7.sh把十六进制排成一行,得到最后的结果final_Result

二、执行结果

[[email protected] modbus]# cat test/c.txt ##最开始是这样的格式
32,0,6,1,3,0,10
32,0,23,1,3,0,10
33,0,6,1,3,0,10
33,0,23,1,3,0,10
34,0,6,1,3,0,10
35,0,6,1,3,0,10
36,0,6,1,3,0,10
37,0,6,1,3,0,10
34,0,23,1,3,0,10
38,0,6,1,3,0,10

#32,0,6,1,3,0,,  #最开始删不掉这种含有两个逗号,中间没有数字的的行

#42,0,6,1,3,0,,   #在shell中,使用awk找到对应行号,然后arr转换为数组,然后sed删除从该行到末尾的行。sed -i $arr‘,$d‘ c.txt

[[email protected] modbus]# cat  final_Result   ##结果就是必须这样的十六形式

00,20, 00,00, 00,06, 01, 03, 00,00, 00,0a,
00,20, 00,00, 00,17, 01, 03, 00,00, 00,0a,
00,21, 00,00, 00,06, 01, 03, 00,00, 00,0a,
00,21, 00,00, 00,17, 01, 03, 00,00, 00,0a,
00,22, 00,00, 00,06, 01, 03, 00,00, 00,0a,
00,23, 00,00, 00,06, 01, 03, 00,00, 00,0a,
00,24, 00,00, 00,06, 01, 03, 00,00, 00,0a,
00,25, 00,00, 00,06, 01, 03, 00,00, 00,0a,
00,22, 00,00, 00,17, 01, 03, 00,00, 00,0a,
00,26, 00,00, 00,06, 01, 03, 00,00, 00,0a,

时间: 2024-08-28 16:03:11

2017-4-21 Shell+Python对抓包文件后的文本处理过程的相关文章

python 进行抓包嗅探

一.绪论 最近一直想弄一个代理,并且对数据包进行解读,从而完成来往流量的嗅探.于是今天学习了一下如何使用Python抓包并进行解包. 首先要用到两个模块 dpkt(我这边ubuntu16.04 LTS)Python2.7中默认安装的 pcap安装 1 pip install pypcap 然后来说一下,pypcap主要用来抓包或者说是sniffer的,dpkt用来解包的,我对dpkt的认知是解包传输层以及传输层一下的数据比较不错,但是对于应用层数据的解读就是渣渣.尤其是HTTP,所以HTTP部分

linux+jmeter+python基础+抓包拦截

LINUX 一 配置jdk 环境 *需要获取root权限,或者切换为root用户 1.windows下载好,去 http://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-2133151.html 官方网站下载jdk(linux相应版本) 2.在usr目录下创建java路径文件夹 [root bin]cd /usr mkdir java 3.将jdk-8u60-linux-x64.tar.gz放到刚才创建的文件夹下

python mitmdump抓包与redis订阅消息

本实例实现需求 django项目,后端采用python mitmdump 扩展脚本"sdk_log.py"实时抓取与过滤4399SDK 客户端日志,并且使用redis发布. 前端使用websocket连接,订阅某频道信息,实时输出对应游戏的客户端日志到页面中. 开发环境 win7,python3, 安装redis_server 参考 在windows x64上部署使用Redis 安装python redis python3 -m pip install redis 安装python m

python+scapy 抓包与解析

1. 简介 最近一直在使用做流量分析,今天把 scapy 部分做一个总结. python 的 scapy 库可以方便的抓包与解析包,无奈资料很少,官方例子有限,大神博客很少提及, 经过一番尝试后,总结以下几点用法以便大家以后使用. 2. 用法实例 安装 作为初学者,关心的首先是如何安装,本人电脑系统是 fedora, 建议使用 linux. 推荐下载 pip,直接:(当然得在 su 权限下) pip install scapy 在 terminal 中输入 scapy, 如果有下面形式即安装好了

tcpdump 抓包 通过 Wireshark分析抓包文件

1. tcpdump的基本原理 1.1  tcpdump starce 的区别 在本机中的进程的系统行为调用跟踪,starce   是一个很好的工具:但是在网络问题的调试中,tcpdump 应该是一个必不可少的工具:能清晰分析网络通信的信息. 默认情况下,tcpdump 不会抓取本机内部通讯的报文   :根据网络协议栈的规定,对于报文,即使是目的地是本机(自己和自己通信),也需要经过本机的网络协议层,所以本机通讯肯定是通过API进入内核,并且完成路由选择.[比如本机的TCP通信,也必须要sock

使用Fiddler对android应用抓包

工作原理 先上个图 此图一目了然,可以看出fiddler在请求中所处的位置,我们就可以确定它能干些什么. 它实际工作在本机的8888端口http代理,我们启动fiddler时,它会自动更改代理设置: chrome中: 从此我们可以看出,只要是http的请求,在请求发起离开本机之前都会经过fiddler,当response回来,没有达到实际请求者时,也会经过fiddler:这样我们就可以在轻易的实现修改请求和响应的内容,这样我们就可以轻松的调试现网的程序. http://www.cnblogs.c

python抓包截取http记录日志

#!/usr/bin/python import pcap import dpkt import re def main(): pc=pcap.pcap(name="eth1")                                             # 抓取 eth1 pc.setfilter('tcp port 80')                                                       # 过滤表达式 tcp port 80

pcapng文件的python解析实例以及抓包补遗

正文 为了弥补pcap文件的缺陷,让抓包文件可以容纳更多的信息,pcapng格式应运而生.关于它的介绍详见<PCAP Next Generation Dump File Format> 当前的wireshark/tshark抓取的包默认都被保存为pcapng格式. 形而上的论述就不多谈了,直接给出一个pcapng数据包文件的例子: 然后我强烈建议,对着<PCAP Next Generation Dump File Format>来把一个实际抓取的pcapng文件里面的每一个字节都对

wiresherk抓包之旅

wireshark的原名是Ethereal,新名字是2006年起用的.当时Ethereal的主要开发者Gerald决定离开他原来供职的公司NIS,并继续开发这个软件.但由于Ethereal这个名称的使用权已经被原来那个公司注册,Wireshark这个新名字也就应运而生了. Wireshark是世界上最流行的网络分析工具.这个强大的工具可以捕捉网络中的数据,并为用户提供关于网络和上层协议的各种信息,也是网络工程师.信息安全工程师必备的一个工具之一. Wireshark下载:https://www.