一、正则介绍
基本元字符(基本正则表达式): 字符匹配: . 匹配换行符之外的任意一个字符 [] 字符组元字符,元字符在[]内,会失去特殊意义不用转义 [^] 除开字符组中的字符 次数匹配: * 匹配前面字符零次或多次 \? 零次或一次 \{m,n\} 至少m次,至多n次 \{m,\} m次 锚定符: \<,\b 词首锚定 \>,\b 词尾锚定 ^ 行的开头 $ 行的结尾 ^$ 空行 .* 任意字符串 分组: \(\) \1,\2 前向引用,\1第一个括号的内容 扩展元字符(sed -r 扩展正则表达式): 不需要加\转义 * ? {m,n} () \1,\2 + 匹配前面字符一次或多次 | 或者
测试了下,sed好像不支持懒惰。
详细点可以到这里看正则表达式30分钟入门:
http://www.cnblogs.com/deerchao/archive/2006/08/24/zhengzhe30fengzhongjiaocheng.html
二、awk使用
1、引用外部变量
格式:awk -v 变量="$变量" ‘BEGIN{print 变量}‘ 可以在 BEGIN 中间 END三个地方引用此变量
[[email protected] awk]# time=`date +%s` [[email protected] awk]# echo $time 1404716831 [[email protected] awk]# awk -v d=$time ‘BEGIN{print d}‘ 1404716831 [[email protected] awk]#
2、内部变量
FS:输入字段分隔符,默认空格,支持正则,如下面使用[]字符组做分隔符
NF:当前处理记录的字段数,$0表示整个记录,$1第一个字段,$NF最后一个字段
[[email protected] awk]# cat b oracle:x:500:500::/home/oracle:/bin/bash [[email protected] awk]# awk ‘BEGIN{FS="[:/]"}{for (i=1;i<=NF;i++){print $i}}‘ b oracle x 500 500 home oracle bin bash [[email protected] awk]#
NR:当前记录行数(所有文件的记录行数)
FNR:每个文件的当前记录行数
[[email protected] awk]# cat c d c1 c2 c3 c4 d5 d6 d7 d8 [[email protected] awk]# awk ‘BEGIN{printf("%5s%5s%5s\n","data"," NR"," FNR")}{printf("%5s%5s%5s\n",$0,NR,FNR)}‘ c d data NR FNR c1 1 1 c2 2 2 c3 3 3 c4 4 4 d5 5 1 d6 6 2 d7 7 3 d8 8 4 [[email protected] awk]#
NR与FNR在处理多文件时经常用到,比如NR==FNR等,可以配合next一起先处理完第一个文件,才处理第二个文件,如:
next:看下面的二个语句,若是没有next,那么读一条记录,在执行完第一个{}后,会接着再执行第二个{},所以他的结果会把第一个文件的重复打印一次。
加了next语句:在执行第一个{}里的next,表示跳过后面的所有语句,重新读取新的一条记录进行匹配
[[email protected] awk]# awk ‘NR==FNR{print $0;next}{print $0}‘ c d c1 c2 c3 c4 d5 d6 d7 d8 [[email protected] awk]# awk ‘NR==FNR{print $0}{print $0}‘ c d c1 c1 c2 c2 c3 c3 c4 c4 d5 d6 d7 d8 [[email protected] awk]#
OFS:输出字段分隔符,默认为空格
ORS:输出行分隔符,默认为换行符
[[email protected] awk]# cat b oracle:x:500:500::/home/oracle:/bin/bash [[email protected] awk]# awk -v FS=":" -v OFS="=" ‘{print $1,$2,$3}‘ b oracle=x=500 [[email protected] awk]# awk -v FS=":" ‘{print $1,$2,$3}‘ b oracle x 500 这里使用-v传OFS和FS的值
[[email protected] awk]# awk ‘1‘ b oracle:x:500:500::/home/oracle:/bin/bash root:x:500:500::/home/oracle:/bin/bash hxw168:x:500:500::/home/oracle:/bin/bash [[email protected] awk]# awk ‘BEGIN{ORS="***";}{print $0}‘ b oracle:x:500:500::/home/oracle:/bin/bash***root:x:500:500::/home/oracle:/bin/bash***hxw168:x:500:500::/home/oracle:/bin/bash***[[email protected] awk]#
RS:输入记录分隔符,默认为换行符
使用了一个不存在的输入分隔符 *** ,所以整个文件内容读入到$0中。
[[email protected] awk]# cat e [a] name=1 sex=2 age=3 [b] address=gd sd [[email protected] awk]# awk -v RS="***" ‘{print $0,"-----"NR}‘ e [a] name=1 sex=2 age=3 [b] address=gd sd -----1 [[email protected] awk]#
3、数组
数组可直接使用,不需要提前声明,用0或空初始化。
split函数:把字符串以指定分隔符分成多个字段,并把它存放到一个数组中,返回分段的个数。
数组的常见循环:
for (数组下标 in 数组) #用这个打印出来的数组是乱序
{
print 数组下标,数组[数组下标];
}
如下:
[[email protected] awk]# cat b oracle:500:home:oracle:/bin/bash [[email protected] awk]# awk ‘{i=split($0,a,":");for(x in a){print x,a[x]};}‘ b 4 oracle 5 /bin/bash 1 oracle 2 500 3 home [[email protected] awk]#
常见例子:
统计用ssh每个ip登录的次数
[[email protected] log]# cat secure.3 Jun 16 10:04:47 localhost sshd[5360]: Accepted password for root from 10.9.11.44 port 7520 ssh2 Jun 16 10:04:47 localhost sshd[5360]: pam_unix(sshd:session): session opened for user root by (uid=0) Jun 16 12:21:27 localhost sshd[5360]: pam_unix(sshd:session): session closed for user root Jun 17 16:23:53 localhost sshd[9174]: Accepted password for root from 10.9.11.44 port 6651 ssh2 Jun 17 16:23:53 localhost sshd[9174]: pam_unix(sshd:session): session opened for user root by (uid=0) Jun 17 19:00:09 localhost sshd[9174]: pam_unix(sshd:session): session closed for user root Jun 18 09:22:33 localhost sshd[11487]: Accepted password for root from 10.9.11.44 port 58455 ssh2 Jun 18 09:22:33 localhost sshd[11487]: pam_unix(sshd:session): session opened for user root by (uid=0) Jun 18 18:30:56 localhost sshd[11487]: pam_unix(sshd:session): session closed for user root Jun 19 15:23:23 localhost sshd[16970]: Accepted password for root from 10.9.11.44 port 48345 ssh2 Jun 19 15:23:23 localhost sshd[16970]: pam_unix(sshd:session): session opened for user root by (uid=0) Jun 19 18:59:00 localhost sshd[16970]: pam_unix(sshd:session): session closed for user root Jun 20 09:24:57 localhost sshd[19425]: Accepted password for root from 10.9.11.44 port 5519 ssh2 Jun 20 09:24:57 localhost sshd[19425]: pam_unix(sshd:session): session opened for user root by (uid=0) Jun 20 19:14:30 localhost sshd[19425]: pam_unix(sshd:session): session closed for user root Jun 21 08:59:13 localhost sshd[22674]: Accepted password for root from 10.9.11.44 port 4640 ssh2 Jun 21 08:59:13 localhost sshd[22674]: pam_unix(sshd:session): session opened for user root by (uid=0) Jun 21 15:23:28 localhost sshd[22674]: subsystem request for sftp Jun 21 18:51:04 localhost sshd[22674]: pam_unix(sshd:session): session closed for user root [[email protected] log]# cat secure.3 | awk ‘/Accepted/{a[$(NF-3)]++}END{for(x in a){print x,a[x]}}‘ 10.9.11.44 6 [[email protected] log]#
【文本处理】awk、sed使用 - 更新中
时间: 2024-10-25 23:47:59