【教程主题】:4.awk和sed
【主要内容】
【1】awk
AWK是贝尔实验室1977年搞出来的文本出现神器.之所以叫AWK是因为其取了三位创始人 Alfred Aho,Peter Weinberger, 和 Brian Kernighan 的Family Name的首字符。要学AWK,就得提一提AWK的一本相当经典的书《The AWK Programming Language》,它在豆瓣上的评分是9.4分!在亚马逊上居然卖1022.30元。
Awk和sed 第二版
我从netstat命令中提取了如下信息作为用例:
$ cat netstat.txt
Proto Recv-Q Send-Q Local-Address Foreign-Address State
tcp 0 0 0.0.0.0:3306 0.0.0.0:* LISTEN
tcp 0 0 0.0.0.0:80 0.0.0.0:* LISTEN
tcp 0 0 127.0.0.1:9000 0.0.0.0:* LISTEN
tcp 0 0 coolshell.cn:80 124.205.5.146:18245 TIME_WAIT
tcp 0 0 coolshell.cn:80 61.140.101.185:37538 FIN_WAIT2
tcp 0 0 coolshell.cn:80 110.194.134.189:1032 ESTABLISHED
tcp 0 0 coolshell.cn:80 123.169.124.111:49809 ESTABLISHED
tcp 0 0 coolshell.cn:80 116.234.127.77:11502 FIN_WAIT2
tcp 0 0 coolshell.cn:80 123.169.124.111:49829 ESTABLISHED
tcp 0 0 coolshell.cn:80 183.60.215.36:36970 TIME_WAIT
tcp 0 4166 coolshell.cn:80 61.148.242.38:30901 ESTABLISHED
tcp 0 1 coolshell.cn:80 124.152.181.209:26825 FIN_WAIT1
tcp 0 0 coolshell.cn:80 110.194.134.189:4796 ESTABLISHED
tcp 0 0 coolshell.cn:80 183.60.212.163:51082 TIME_WAIT
tcp 0 1 coolshell.cn:80 208.115.113.92:50601 LAST_ACK
tcp 0 0 coolshell.cn:80 123.169.124.111:49840 ESTABLISHED
tcp 0 0 coolshell.cn:80 117.136.20.85:50025 FIN_WAIT2
tcp 0 0 :::22 :::* LISTEN
下面是最简单最常用的awk示例,其输出第1列和第4例,
· 其中单引号中的被大括号括着的就是awk的语句,注意,其只能被单引号包含。
· 其中的$1..$n表示第几例。注:$0表示整个行。
$ awk ‘{print $1, $4}‘ netstat.txt
Proto Local-Address
tcp 0.0.0.0:3306
tcp 0.0.0.0:80
tcp 127.0.0.1:9000
tcp coolshell.cn:80
tcp coolshell.cn:80
tcp coolshell.cn:80
tcp coolshell.cn:80
tcp coolshell.cn:80
tcp coolshell.cn:80
tcp coolshell.cn:80
tcp coolshell.cn:80
tcp coolshell.cn:80
tcp coolshell.cn:80
tcp coolshell.cn:80
tcp coolshell.cn:80
tcp coolshell.cn:80
tcp coolshell.cn:80
tcp :::22
我们再来看看awk的格式化输出,和C语言的printf没什么两样:
$ awk‘{printf "%-8s %-8s %-8s %-18s %-22s %-15s\n",$1,$2,$3,$4,$5,$6}‘netstat.txt
Proto Recv-Q Send-Q Local-Address Foreign-Address State
tcp 0 0 0.0.0.0:3306 0.0.0.0:* LISTEN
tcp 0 0 0.0.0.0:80 0.0.0.0:* LISTEN
tcp 0 0 127.0.0.1:9000 0.0.0.0:* LISTEN
tcp 0 0 coolshell.cn:80 124.205.5.146:18245 TIME_WAIT
tcp 0 0 coolshell.cn:80 61.140.101.185:37538 FIN_WAIT2
tcp 0 0 coolshell.cn:80 110.194.134.189:1032 ESTABLISHED
tcp 0 0 coolshell.cn:80 123.169.124.111:49809 ESTABLISHED
tcp 0 0 coolshell.cn:80 116.234.127.77:11502 FIN_WAIT2
tcp 0 0 coolshell.cn:80 123.169.124.111:49829 ESTABLISHED
tcp 0 0 coolshell.cn:80 183.60.215.36:36970 TIME_WAIT
tcp 0 4166 coolshell.cn:80 61.148.242.38:30901 ESTABLISHED
tcp 0 1 coolshell.cn:80 124.152.181.209:26825 FIN_WAIT1
tcp 0 0 coolshell.cn:80 110.194.134.189:4796 ESTABLISHED
tcp 0 0 coolshell.cn:80 183.60.212.163:51082 TIME_WAIT
tcp 0 1 coolshell.cn:80 208.115.113.92:50601 LAST_ACK
tcp 0 0 coolshell.cn:80 123.169.124.111:49840 ESTABLISHED
tcp 0 0 coolshell.cn:80 117.136.20.85:50025 FIN_WAIT2
tcp 0 0 :::22 :::* LISTEN
过滤记录
我们再来看看如何过滤记录(下面过滤条件为:第三列的值为0 && 第6列的值为LISTEN)
$ awk ‘$3==0 && $6=="LISTEN" ‘ netstat.txt
tcp 0 0 0.0.0.0:3306 0.0.0.0:* LISTEN
tcp 0 0 0.0.0.0:80 0.0.0.0:* LISTEN
tcp 0 0 127.0.0.1:9000 0.0.0.0:* LISTEN
tcp 0 0 :::22 :::* LISTEN
其中的“==”为比较运算符。其他比较运算符:!=, <, < >=, < p>
我们来看看各种过滤记录的方式:
$ awk‘ $3>0 {print $0}‘netstat.txt
Proto Recv-Q Send-Q Local-Address Foreign-Address State
tcp 0 4166 coolshell.cn:80 61.148.242.38:30901 ESTABLISHED
tcp 0 1 coolshell.cn:80 124.152.181.209:26825 FIN_WAIT1
tcp 0 1 coolshell.cn:80 208.115.113.92:50601 LAST_ACK
如果我们需要表头的话,我们可以引入内建变量NR:
$ awk ‘$3==0 && $6=="LISTEN" || NR==1 ‘ netstat.txt
Proto Recv-Q Send-Q Local-Address Foreign-Address State
tcp 0 0 0.0.0.0:3306 0.0.0.0:* LISTEN
tcp 0 0 0.0.0.0:80 0.0.0.0:* LISTEN
tcp 0 0 127.0.0.1:9000 0.0.0.0:* LISTEN
tcp 0 0 :::22 :::* LISTEN
再加上格式化输出:
$ awk‘$3==0 && $6=="LISTEN" || NR==1 {printf "%-20s %-20s %s\n",$4,$5,$6}‘netstat.txt
Local-Address Foreign-Address State
0.0.0.0:3306 0.0.0.0:* LISTEN
0.0.0.0:80 0.0.0.0:* LISTEN
127.0.0.1:9000 0.0.0.0:* LISTEN
:::22 :::* LISTEN
内建变量
说到了内建变量,我们可以来看看awk的一些内建变量:
$0 |
当前记录(这个变量中存放着整个行的内容) |
$1~$n |
当前记录的第n个字段,字段间由FS分隔 |
FS |
输入字段分隔符 默认是空格或Tab |
NF |
当前记录中的字段个数,就是有多少列 |
NR |
已经读出的记录数,就是行号,从1开始,如果有多个文件话,这个值也是不断累加中。 |
FNR |
当前记录数,与NR不同的是,这个值会是各个文件自己的行号 |
RS |
输入的记录分隔符, 默认为换行符 |
OFS |
输出字段分隔符, 默认也是空格 |
ORS |
输出的记录分隔符,默认为换行符 |
FILENAME |
当前输入文件的名字 |
怎么使用呢,比如:我们如果要输出行号:
$ awk‘$3==0 && $6=="ESTABLISHED" || NR==1 {printf "%02s %s %-20s %-20s %s\n",NR, FNR, $4,$5,$6}‘netstat.txt
01 1 Local-Address Foreign-Address State
07 7 coolshell.cn:80 110.194.134.189:1032 ESTABLISHED
08 8 coolshell.cn:80 123.169.124.111:49809 ESTABLISHED
10 10 coolshell.cn:80 123.169.124.111:49829 ESTABLISHED
14 14 coolshell.cn:80 110.194.134.189:4796 ESTABLISHED
17 17 coolshell.cn:80 123.169.124.111:49840 ESTABLISHED
指定分隔符
$ awk ‘BEGIN{FS=":"} {print $1,$3,$6}‘ /etc/passwd
root 0 /root
bin 1 /bin
daemon 2 /sbin
adm 3 /var/adm
lp 4 /var/spool/lpd
sync5 /sbin
shutdown6 /sbin
halt 7 /sbin
上面的命令也等价于:(-F的意思就是指定分隔符)
$ awk-F: ‘{print $1,$3,$6}‘/etc/passwd
注:如果你要指定多个分隔符,你可以这样来:
awk-F ‘[;:]‘
再来看一个以\t作为分隔符输出的例子(下面使用了/etc/passwd文件,这个文件是以:分隔的):
$ awk-F: ‘{print $1,$3,$6}‘OFS="\t"/etc/passwd
root 0 /root
bin 1 /bin
daemon 2 /sbin
adm 3 /var/adm
lp 4 /var/spool/lpd
sync5 /sbin
字符串匹配
我们再来看几个字符串匹配的示例:
$ awk ‘$6 ~ /FIN/ || NR==1 {print NR,$4,$5,$6}‘ OFS="\t" netstat.txt
1 Local-Address Foreign-Address State
6 coolshell.cn:80 61.140.101.185:37538 FIN_WAIT2
9 coolshell.cn:80 116.234.127.77:11502 FIN_WAIT2
13 coolshell.cn:80 124.152.181.209:26825 FIN_WAIT1
18 coolshell.cn:80 117.136.20.85:50025 FIN_WAIT2
$ $ awk ‘$6 ~ /WAIT/ || NR==1 {print NR,$4,$5,$6}‘ OFS="\t" netstat.txt
1 Local-Address Foreign-Address State
5 coolshell.cn:80 124.205.5.146:18245 TIME_WAIT
6 coolshell.cn:80 61.140.101.185:37538 FIN_WAIT2
9 coolshell.cn:80 116.234.127.77:11502 FIN_WAIT2
11 coolshell.cn:80 183.60.215.36:36970 TIME_WAIT
13 coolshell.cn:80 124.152.181.209:26825 FIN_WAIT1
15 coolshell.cn:80 183.60.212.163:51082 TIME_WAIT
18 coolshell.cn:80 117.136.20.85:50025 FIN_WAIT2
上面的第一个示例匹配FIN状态, 第二个示例匹配WAIT字样的状态。其实 ~ 表示模式开始。/ /中是模式。这就是一个正则表达式的匹配。
其实awk可以像grep一样的去匹配第一行,就像这样:
$ awk ‘/LISTEN/‘ netstat.txt
tcp 0 0 0.0.0.0:3306 0.0.0.0:* LISTEN
tcp 0 0 0.0.0.0:80 0.0.0.0:* LISTEN
tcp 0 0 127.0.0.1:9000 0.0.0.0:* LISTEN
tcp 0 0 :::22 :::* LISTEN
我们可以使用 “/FIN|TIME/” 来匹配 FIN 或者 TIME :
$ awk‘$6 ~ /FIN|TIME/ || NR==1 {print NR,$4,$5,$6}‘OFS="\t"netstat.txt
1 Local-Address Foreign-Address State
5 coolshell.cn:80 124.205.5.146:18245 TIME_WAIT
6 coolshell.cn:80 61.140.101.185:37538 FIN_WAIT2
9 coolshell.cn:80 116.234.127.77:11502 FIN_WAIT2
11 coolshell.cn:80 183.60.215.36:36970 TIME_WAIT
13 coolshell.cn:80 124.152.181.209:26825 FIN_WAIT1
15 coolshell.cn:80 183.60.212.163:51082 TIME_WAIT
18 coolshell.cn:80 117.136.20.85:50025 FIN_WAIT2
再来看看模式取反的例子:
$ awk‘$6 !~ /WAIT/ || NR==1 {print NR,$4,$5,$6}‘OFS="\t"netstat.txt
1 Local-Address Foreign-Address State
2 0.0.0.0:3306 0.0.0.0:* LISTEN
3 0.0.0.0:80 0.0.0.0:* LISTEN
4 127.0.0.1:9000 0.0.0.0:* LISTEN
7 coolshell.cn:80 110.194.134.189:1032 ESTABLISHED
8 coolshell.cn:80 123.169.124.111:49809 ESTABLISHED
10 coolshell.cn:80 123.169.124.111:49829 ESTABLISHED
12 coolshell.cn:80 61.148.242.38:30901 ESTABLISHED
14 coolshell.cn:80 110.194.134.189:4796 ESTABLISHED
16 coolshell.cn:80 208.115.113.92:50601 LAST_ACK
17 coolshell.cn:80 123.169.124.111:49840 ESTABLISHED
19 :::22 :::* LISTEN
或是:
Awk ‘!/WAIT/‘ netstat.txt
折分文件
awk拆分文件很简单,使用重定向就好了。下面这个例子,是按第6例分隔文件,相当的简单(其中的NR!=1表示不处理表头)。
$ awk ‘NR!=1{print > $6}‘ netstat.txt
$ ls
ESTABLISHED FIN_WAIT1 FIN_WAIT2 LAST_ACK LISTEN netstat.txt TIME_WAIT
$ cat ESTABLISHED
tcp 0 0 coolshell.cn:80 110.194.134.189:1032 ESTABLISHED
tcp 0 0 coolshell.cn:80 123.169.124.111:49809 ESTABLISHED
tcp 0 0 coolshell.cn:80 123.169.124.111:49829 ESTABLISHED
tcp 0 4166 coolshell.cn:80 61.148.242.38:30901 ESTABLISHED
tcp 0 0 coolshell.cn:80 110.194.134.189:4796 ESTABLISHED
tcp 0 0 coolshell.cn:80 123.169.124.111:49840 ESTABLISHED
$ cat FIN_WAIT1
tcp 0 1 coolshell.cn:80 124.152.181.209:26825 FIN_WAIT1
$ cat FIN_WAIT2
tcp 0 0 coolshell.cn:80 61.140.101.185:37538 FIN_WAIT2
tcp 0 0 coolshell.cn:80 116.234.127.77:11502 FIN_WAIT2
tcp 0 0 coolshell.cn:80 117.136.20.85:50025 FIN_WAIT2
$ cat LAST_ACK
tcp 0 1 coolshell.cn:80 208.115.113.92:50601 LAST_ACK
$ cat LISTEN
tcp 0 0 0.0.0.0:3306 0.0.0.0:* LISTEN
tcp 0 0 0.0.0.0:80 0.0.0.0:* LISTEN
tcp 0 0 127.0.0.1:9000 0.0.0.0:* LISTEN
tcp 0 0 :::22 :::* LISTEN
$ cat TIME_WAIT
tcp 0 0 coolshell.cn:80 124.205.5.146:18245 TIME_WAIT
tcp 0 0 coolshell.cn:80 183.60.215.36:36970 TIME_WAIT
tcp 0 0 coolshell.cn:80 183.60.212.163:51082 TIME_WAIT
你也可以把指定的列输出到文件:
awk ‘NR!=1{print $4,$5 < $6}‘netstat.txt
再复杂一点:(注意其中的if-else-if语句,可见awk其实是个脚本解释器)
$ awk ‘NR!=1{if($6 ~ /TIME|ESTABLISHED/) print > "1.txt";
else if($6 ~ /LISTEN/) print > "2.txt";
else print > "3.txt" }‘ netstat.txt
$ ls?.txt
1.txt 2.txt 3.txt
$ cat1.txt
tcp 0 0 coolshell.cn:80 124.205.5.146:18245 TIME_WAIT
tcp 0 0 coolshell.cn:80 110.194.134.189:1032 ESTABLISHED
tcp 0 0 coolshell.cn:80 123.169.124.111:49809 ESTABLISHED
tcp 0 0 coolshell.cn:80 123.169.124.111:49829 ESTABLISHED
tcp 0 0 coolshell.cn:80 183.60.215.36:36970 TIME_WAIT
tcp 0 4166 coolshell.cn:80 61.148.242.38:30901 ESTABLISHED
tcp 0 0 coolshell.cn:80 110.194.134.189:4796 ESTABLISHED
tcp 0 0 coolshell.cn:80 183.60.212.163:51082 TIME_WAIT
tcp 0 0 coolshell.cn:80 123.169.124.111:49840 ESTABLISHED
$ cat2.txt
tcp 0 0 0.0.0.0:3306 0.0.0.0:* LISTEN
tcp 0 0 0.0.0.0:80 0.0.0.0:* LISTEN
tcp 0 0 127.0.0.1:9000 0.0.0.0:* LISTEN
tcp 0 0 :::22 :::* LISTEN
$ cat3.txt
tcp 0 0 coolshell.cn:80 61.140.101.185:37538 FIN_WAIT2
tcp 0 0 coolshell.cn:80 116.234.127.77:11502 FIN_WAIT2
tcp 0 1 coolshell.cn:80 124.152.181.209:26825 FIN_WAIT1
tcp 0 1 coolshell.cn:80 208.115.113.92:50601 LAST_ACK
tcp 0 0 coolshell.cn:80 117.136.20.85:50025 FIN_WAIT2
统计
下面的命令计算所有的C文件,CPP文件和H文件的文件大小总和。
$ ls-l *.cpp *.c *.h | awk‘{sum+=$5} END {print sum}‘
2511401
我们再来看一个统计各个connection状态的用法:(我们可以看到一些编程的影子了,大家都是程序员我就不解释了。注意其中的数组的用法)
$ awk ‘NR!=1{a[$6]++;} END {for (i in a) print i ", " a[i];}‘ netstat.txt
TIME_WAIT, 3
FIN_WAIT1, 1
ESTABLISHED, 6
FIN_WAIT2, 3
LAST_ACK, 1
LISTEN, 4
再来看看统计每个用户的进程的占了多少内存(注:sum的RSS那一列)
$ ps aux | awk ‘NR!=1{a[$1]+=$6;} END { for(i in a) print i ", " a[i]"KB";}‘
dbus, 540KB
mysql, 99928KB
www, 3264924KB
root, 63644KB
hchen, 6020KB
awk脚本
在上面我们可以看到一个END关键字。END的意思是“处理完所有的行的标识”,即然说到了END就有必要介绍一下BEGIN,这两个关键字意味着执行前和执行后的意思,语法如下:
· BEGIN{ 这里面放的是执行前的语句 }
· END {这里面放的是处理完所有的行后要执行的语句 }
· {这里面放的是处理每一行时要执行的语句}
为了说清楚这个事,我们来看看下面的示例:
假设有这么一个文件(学生成绩表):
$ catscore.txt
Marry 2143 78 84 77
Jack 2321 66 78 45
Tom 2122 48 77 71
Mike 2537 87 97 95
Bob 2415 40 57 62
我们的awk脚本如下(我没有写有命令行上是因为命令行上不易读,另外也在介绍另一种用法):
$ cat cal.awk
#!/bin/awk -f
#运行前
BEGIN {
math = 0
english = 0
computer = 0
printf"NAME NO. MATH ENGLISH COMPUTER TOTAL\n"
printf"---------------------------------------------\n"
}
#运行中
{
math+=$3
english+=$4
computer+=$5
printf"%-6s %-6s %4d %8d %8d %8d\n", $1, $2, $3,$4,$5, $3+$4+$5
}
#运行后
END {
printf"---------------------------------------------\n"
printf" TOTAL:%10d %8d %8d \n", math, english, computer
printf"AVERAGE:%10.2f %8.2f %8.2f\n", math/NR, english/NR, computer/NR
}
我们来看一下执行结果:(也可以这样运行 ./cal.awk score.txt)
$ awk-f cal.awk score.txt
NAME NO. MATH ENGLISH COMPUTER TOTAL
---------------------------------------------
Marry 2143 78 84 77 239
Jack 2321 66 78 45 189
Tom 2122 48 77 71 196
Mike 2537 87 97 95 279
Bob 2415 40 57 62 159
---------------------------------------------
TOTAL: 319 393 350
AVERAGE: 63.80 78.60 70.00
环境变量
即然说到了脚本,我们来看看怎么和环境变量交互:(使用-v参数和ENVIRON,使用ENVIRON的环境变量需要export)
$ x=5
$ y=10
$ export y
$ echo$x $y
5 10
$ awk -v val=$x ‘{print $1, $2, $3, $4+val, $5+ENVIRON["y"]}‘ OFS="\t" score.txt
Marry 2143 78 89 87
Jack 2321 66 83 55
Tom 2122 48 82 81
Mike 2537 87 102 105
Bob 2415 40 62 72
最后,我们再来看几个小例子:
#从file文件中找出长度大于80的行
awk ‘length>80‘ file
#按连接数查看客户端IP
netstat -ntu | awk‘{print $5}‘| cut-d: -f1 | sort| uniq-c | sort-nr
#打印99乘法表
Seq 9 | sed‘H;g‘| awk-vRS=‘‘‘{for(i=1;i< NFiprintfdxd dsquot i NR iNR i="=NR?"\n":"\t")}‘
关于其中的一些知识点可以参看gawk的手册:
· 内建变量,参看:http://www.gnu.org/software/gawk/manual/gawk.html#Built_002din-Variables
· 流控方面,参看:http://www.gnu.org/software/gawk/manual/gawk.html#Statements
· 内建函数,参看:http://www.gnu.org/software/gawk/manual/gawk.html#Built_002din
· 正则表达式,参看:http://www.gnu.org/software/gawk/manual/gawk.html#Regexp
【2】sed
sed全名叫stream editor,流编辑器,用程序的方式来编辑文本,相当的hacker啊。sed基本上就是玩正则模式匹配,所以,玩sed的人,正则表达式一般都比较强。
同样,本篇文章不会说sed的全部东西,你可以参看sed的手册
用s命令替换
我使用下面的这段文本做演示:
$ cat pets.txt
This is my cat
my cat‘s name is betty
This is my dog
my dog‘s name is frank
This is my fish
my fish‘s name is george
This is my goat
my goat‘s name is adam
把其中的my字符串替换成Hao Chen’s,下面的语句应该很好理解(s表示替换命令,/my/表示匹配my,/Hao Chen’s/表示把匹配替换成Hao Chen’s,/g 表示一行上的替换所有的匹配):
$ sed "s/my/Hao Chen‘s/g" pets.txt
This is Hao Chen‘s cat
Hao Chen‘s cat‘s name is betty
This is Hao Chen‘s dog
Hao Chen‘s dog‘s name is frank
This is Hao Chen‘s fish
Hao Chen‘s fish‘s name is george
This is Hao Chen‘s goat
Hao Chen‘s goat‘s name is adam
注意:如果你要使用单引号,那么你没办法通过\’这样来转义,就有双引号就可以了,在双引号内可以用\”来转义。
再注意:上面的sed并没有对文件的内容改变,只是把处理过后的内容输出,如果你要写回文件,你可以使用重定向,如:
$ sed"s/my/Hao Chen‘s/g"pets.txt > hao_pets.txt
或使用 -i 参数直接修改文件内容:
$ sed -i "s/my/Hao Chen‘s/g" pets.txt
在每一行最前面加点东西:
$ sed ‘s/^/#/g‘ pets.txt
#This is my cat
# my cat‘s name is betty
#This is my dog
# my dog‘s name is frank
#This is my fish
# my fish‘s name is george
#This is my goat
# my goat‘s name is adam
在每一行最后面加点东西:
$ sed ‘s/$/ --- /g‘ pets.txt
This is my cat---
my cat‘s name is betty ---
This is my dog ---
my dog‘s name is frank ---
This is my fish ---
my fish‘s name is george ---
This is my goat ---
my goat‘s name is adam ---
顺手介绍一下正则表达式的一些最基本的东西:
· ^ 表示一行的开头。如:/^#/ 以#开头的匹配。
· $ 表示一行的结尾。如:/}$/ 以}结尾的匹配。
· \< 表示词首。 如 \<abc 表示以 abc 为首的詞。
· \> 表示词尾。 如 abc\> 表示以 abc 結尾的詞。
· . 表示任何单个字符。
· * 表示某个字符出现了0次或多次。
· [ ] 字符集合。 如:[abc]表示匹配a或b或c,还有[a-zA-Z]表示匹配所有的26个字符。如果其中有^表示反,如[^a]表示非a的字符
正规则表达式是一些很牛的事,比如我们要去掉某html中的tags:
html.txt |
< code>b<This</>b< is what < code>spanstyle="text-decoration: underline;"<I</>span< meant. Understand?
看看我们的sed命令
# 如果你这样搞的话,就会有问题
$ sed ‘s/< >//g‘ html.txt
Understand?
# 要解决上面的那个问题,就得像下面这样。
# 其中的‘[^<]‘ 指定了除了<的字符重复0次或多次。
$ sed ‘s/<[^>]*>//g‘ html.txt
This is what I meant. Understand?
我们再来看看指定需要替换的内容:
$ sed"3s/my/your/g"pets.txt
This is my cat
my cat‘s name is betty
This is your dog
my dog‘s name is frank
This is my fish
my fish‘s name is george
This is my goat
my goat‘s name is adam
下面的命令只替换第3到第6行的文本。
$ sed "3,6s/my/your/g"pets.txt
This is my cat
my cat‘s name is betty
This is your dog
your dog‘s name is frank
This is your fish
your fish‘s name is george
This is my goat
my goat‘s name is adam
$ cat my.txt
This is my cat, my cat‘s name is betty
This is my dog, my dog‘s name is frank
This is my fish, my fish‘s name is george
This is my goat, my goat‘s name is adam
只替换每一行的第一个s:
$ sed ‘s/s/S/1‘ my.txt
ThiS is my cat, my cat‘s name is betty
ThiS is my dog, my dog‘s name is frank
ThiS is my fish, my fish‘s name is george
ThiS is my goat, my goat‘s name is adam
只替换每一行的第二个s:
$ sed‘s/s/S/2‘my.txt
This iS my cat, my cat‘s name is betty
This iS my dog, my dog‘s name is frank
This iS my fish, my fish‘s name is george
This iS my goat, my goat‘s name is adam
只替换第一行的第3个以后的s:
$ sed‘s/s/S/3g‘my.txt
This is my cat, my cat‘S name iS betty
This is my dog, my dog‘S name iS frank
This is my fiSh, my fiSh‘S name iS george
This is my goat, my goat‘S name iS adam
多个匹配
如果我们需要一次替换多个模式,可参看下面的示例:(第一个模式把第一行到第三行的my替换成your,第二个则把第3行以后的This替换成了That)
$ sed ‘1,3s/my/your/g; 3,$s/This/That/g‘my.txt
This is your cat, your cat‘s name is betty
This is your dog, your dog‘s name is frank
That is your fish, your fish‘s name is george
That is my goat, my goat‘s name is adam
上面的命令等价于:(注:下面使用的是sed的-e命令行参数)
Sed -e ‘1,3s/my/your/g‘ -e ‘3,$s/This/That/g‘ my.txt
我们可以使用&来当做被匹配的变量,然后可以在基本左右加点东西。如下所示:
$ sed ‘s/my/[&]/g‘ my.txt
This is [my] cat, [my] cat‘s name is betty
This is [my] dog, [my] dog‘s name is frank
This is [my] fish, [my] fish‘s name is george
This is [my] goat, [my] goat‘s name is adam
圆括号匹配
使用圆括号匹配的示例:(圆括号括起来的正则表达式所匹配的字符串会可以当成变量来使用,sed中使用的是\1,\2…)
$ sed ‘s/This is my \([^,]*\),.*is \(.*\)/\1:\2/g‘ my.txt
cat:betty
dog:frank
fish:george
goat:adam
上面这个例子中的正则表达式有点复杂,解开如下(去掉转义字符):
正则为:This is my ([^,]*),.*is (.*)
匹配为:This is my (cat),……….is (betty)
然后:\1就是cat,\2就是betty
sed的命令
让我们回到最一开始的例子pets.txt,让我们来看几个命令:
N命令
先来看N命令 —— 把下一行的内容纳入当成缓冲区做匹配。
下面的的示例会把原文本中的偶数行纳入奇数行匹配,而s只匹配并替换一次,所以,就成了下面的结果:
$ sed‘N;s/my/your/‘pets.txt
This is your cat
my cat‘s name is betty
This is your dog
my dog‘s name is frank
This is your fish
my fish‘s name is george
This is your goat
my goat‘s name is adam
也就是说,原来的文件成了:
This is my cat\n my cat‘s name is betty
This is my dog\n my dog‘s name is frank
This is my fish\n my fish‘s name is george
This is my goat\n my goat‘s name is adam
这样一来,下面的例子你就明白了,
$ sed‘N;s/\n/,/‘pets.txt
This is my cat, my cat‘s name is betty
This is my dog, my dog‘s name is frank
This is my fish, my fish‘s name is george
This is my goat, my goat‘s name is adam
a命令和i命令
a命令就是append, i命令就是insert,它们是用来添加行的。如:
# 其中的1i表明,其要在第1行前插入一行(insert)
$ sed "1 i This is my monkey, my monkey‘s name is wukong"my.txt
This is my monkey, my monkey‘s name is wukong
This is my cat, my cat‘s name is betty
This is my dog, my dog‘s name is frank
This is my fish, my fish‘s name is george
This is my goat, my goat‘s name is adam
# 其中的1a表明,其要在最后一行后追加一行(append)
$ sed"$ a This is my monkey, my monkey‘s name is wukong"my.txt
This is my cat, my cat‘s name is betty
This is my monkey, my monkey‘s name is wukong
This is my dog, my dog‘s name is frank
This is my fish, my fish‘s name is george
This is my goat, my goat‘s name is adam
我们可以运用匹配来添加文本:
# 注意其中的/fish/a,这意思是匹配到/fish/后就追加一行
$ sed"/fish/a This is my monkey, my monkey‘s name is wukong"my.txt
This is my cat, my cat‘s name is betty
This is my dog, my dog‘s name is frank
This is my fish, my fish‘s name is george
This is my monkey, my monkey‘s name is wukong
This is my goat, my goat‘s name is adam
下面这个例子是对每一行都挺插入:
$ sed"/my/a ----"my.txt
This is my cat, my cat‘s name is betty
----
This is my dog, my dog‘s name is frank
----
This is my fish, my fish‘s name is george
----
This is my goat, my goat‘s name is adam
----
c命令
c 命令是替换匹配行
$ sed "2 c This is my monkey, my monkey‘s name is wukong" my.txt
This is my cat, my cat‘s name is betty
This is my monkey, my monkey‘s name is wukong
This is my fish, my fish‘s name is george
This is my goat, my goat‘s name is adam
$ sed"/fish/c This is my monkey, my monkey‘s name is wukong"my.txt
This is my cat, my cat‘s name is betty
This is my dog, my dog‘s name is frank
This is my monkey, my monkey‘s name is wukong
This is my goat, my goat‘s name is adam
d命令
删除匹配行
$ sed‘/fish/d‘my.txt
This is my cat, my cat‘s name is betty
This is my dog, my dog‘s name is frank
This is my goat, my goat‘s name is adam
$ sed‘2d‘my.txt
This is my cat, my cat‘s name is betty
This is my fish, my fish‘s name is george
This is my goat, my goat‘s name is adam
$ sed‘2,$d‘my.txt
This is my cat, my cat‘s name is betty
p命令
打印命令
你可以把这个命令当成grep式的命令
# 匹配fish并输出,可以看到fish的那一行被打了两遍,
# 这是因为sed处理时会把处理的信息输出
$ sed‘/fish/p‘my.txt
This is my cat, my cat‘s name is betty
This is my dog, my dog‘s name is frank
This is my fish, my fish‘s name is george
This is my fish, my fish‘s name is george
This is my goat, my goat‘s name is adam
# 使用n参数就好了
$ sed -n ‘/fish/p‘my.txt
This is my fish, my fish‘s name is george
# 从一个模式到另一个模式
$ sed -n ‘/dog/,/fish/p‘my.txt
This is my dog, my dog‘s name is frank
This is my fish, my fish‘s name is george
#从第一行打印到匹配fish成功的那一行
$ sed-n ‘1,/fish/p‘my.txt
This is my cat, my cat‘s name is betty
This is my dog, my dog‘s name is frank
This is my fish, my fish‘s name is george