BEGIN和END关键字是来用来读取数据流之前或之后执行命令的特殊模式
1、内建变量
FIELDWIDTHS:由空格分隔开的的定义每个数据字段确切宽度的一列数字
FS:输入字段分隔符
RS:输入数据行分隔符
OFS:输出字段分隔符
ORS:输出数据行分隔符
[[email protected] conf]# cat data2 data1,data2,data3,data4,data5 data6,data7,data8,data9,data10 data11,data12,data13,data14,data15
[[email protected] conf]# gawk ‘BEGIN{FS=",";OFS="-"} {print $1,$2,$3}‘ data2 data1-data2-data3 data6-data7-data8 data11-data12-data13
数据流占多行情况下
把RS变量设置成空白字符串,然后在数据行间留一个空白行。gawk会把每个空白行当作一个数据行分隔符
[[email protected] conf]# cat data3 Tom Mullen 123 Main Street Chicago.IL 6001 (312)555-1234 Frank Williams 456 Oak street Indianpolis.IN 46201 (317)55-9786
[[email protected] conf]# gawk ‘BEGIN{FS="\n";RS=""} {print $1,$4}‘ data3 Tom Mullen (312)555-1234 Frank Williams (317)55-9786
2、数据变量
FNR:当前数据文件中的数据行数
NF:数据文件中的字段总数
NR:已处理的输入数据行数目
[[email protected] conf]# gawk ‘BEGIN{FS=":";OFS=";"}{print $1,$NF}‘ /etc/passwd root;/bin/bash bin;/sbin/nologin daemon;/sbin/nologin adm;/sbin/nologin lp;/sbin/nologin
[[email protected] conf]# cat data2 data1,data2,data3,data4,data5 data6,data7,data8,data9,data10 data11,data12,data13,data14,data15 [[email protected] conf]# gawk ‘ BEGIN{FS=","} {print $1,"FNR="FNR,"NR="NR} END{print "there were",NR,"records processed"}‘ data2 data2 data1 FNR=1 NR=1 data6 FNR=2 NR=2 data11 FNR=3 NR=3 data1 FNR=1 NR=4 data6 FNR=2 NR=5 data11 FNR=3 NR=6 there were 6 records processed
3、自定义变量
变量名不能以数字开头,
[[email protected] conf]# gawk ‘ > BEGIN{ > testing="this is a test" > print testing > }‘ this is a test
[[email protected] conf]# gawk ‘BEGIN{X=4;X=X*2+3;print X}‘ 11
在命令行上给变量赋值
[[email protected] tmp]# cat script1 #!/bin/bash # BEGIN{FS=","} {print $n} [[email protected] tmp]# cat data3 data1,data2,data3,data4,data5 data6,data7,data8,data9,data10 data11,data12,data13,data14,data15 [[email protected] tmp]# gawk -f script1 n=2 data3 data2 data7 data12
在设置变量后,这个值在代码的BEGINU部分不可用
[[email protected] tmp]# vim script1 #!/bin/bash # BEGIN{print "the starting value is ",n;FS=","} {print $n}
[[email protected] tmp]# gawk -f script1 n=3 data3 the starting value is data3 data8 data13
加上-v,允许你指定BEGIN代码部分之前设定的变量
[[email protected] tmp]# gawk -f script1 -v n=3 data3 the starting value is 3 data3 data8 data13
4、遍历数组
for (var in array)
{
statements
}
for语句在每次将关联数组array的下一个索引值赋给变量var时,执行一遍statements。
记住这个变量是索引值而不是数组元素的值
[[email protected] tmp]# gawk ‘BEGIN{ var["a"]=1 var["g"]=2 var["m"]=3 var["u"]=4 for (test in var) { print "Index:",test," - Valuse:",var[test] } }‘ Index: u - Valuse: 4 Index: m - Valuse: 3 Index: a - Valuse: 1 Index: g - Valuse: 2
删除数组的变量
从关联数组中删除数组索引要用一个特别命令
delete array[index]
[[email protected] tmp]# gawk ‘BEGIN{ var["a"]=1 var["g"]=2 for (test in var) { print "Index:",test,"- Value",var[test] } delete var["g"] print "------------" for (test in var) { print "Index:",test,"- Value",var[test] } > }‘ Index: a - Value 1 Index: g - Value 2 ------------ Index: a - Value 1
5、匹配操作符
匹配操作符(~),允许将正则表达式限定在数据行中的特定数据字段。
$1 ~/expression/
用正则表达式/^data2/来匹配第二个数据段,过滤第二个字段以文本data2为开头的行
[[email protected] tmp]# cat data3 data1,data2,data3,data4,data5 data6,data7,data8,data9,data10 data11,data12,data13,data14,data15 data21,data22,data23,data24,data25 [[email protected] tmp]# gawk ‘BEGIN{FS=","} $2 ~/[^data2]/{print $0}‘ data3 data6,data7,data8,data9,data10 data11,data12,data13,data14,data15
gawk程序脚本常用在数据文件查找特定数据元素的强大工具
[[email protected] tmp]# gawk -F: ‘$1 ~/root/{print $1,$NF}‘ /etc/passwd root /bin/bash
用!符号来排除正则表达式的匹配
$1 !~/expression/
gawk -F: ‘$1 !~/root/{print $1,$NF}‘ /etc/passwd
6、数学表达式
x == y
x <= y
x < y
x >= y
x > y
如显示所有属于root用户组的(组ID为0)的系统用户
[[email protected] tmp]# gawk -F: ‘$4==0{print $1}‘ /etc/passwd root sync shutdown halt operator
对文本数据使用表达式,必须要小心,跟正则表达式不同,表达式必须完全匹配
[[email protected] tmp]# gawk -F, ‘$1== "data1" {print $1}‘ data3 data1
6、if语句
if (condition)
statement1
或者 if (condition) statement1
[[email protected] tmp]# cat data4 10 5 3 13 55 [[email protected] tmp]# gawk ‘{if ($1>20) print $1}‘ data4 55
[[email protected] tmp]# gawk ‘{ > if ($1>20) > { > x=$1 * 2 > print x > } > }‘ data4 110
gawk的if语句也支持else子句,允许在if语句条件不成立的情况下执行一条或多条语句。
[[email protected] tmp]# gawk ‘{ > if ($1>100) > { > x=$1 * 2 > print x > } else > { > x=$1 / 2 > print x > }}‘ data4 5 2.5 1.5 6.5 27.5
也可以这么写
if (condition) statement1:else statement2
[[email protected] tmp]# gawk ‘{if ($1>100) print $1*2;else print $1/2}‘ data4 5 2.5 1.5 6.5 27.5
7、while语句
while (condition)
{
statement1
}
[[email protected] tmp]# cat data5 120 130 135 160 113 140 145 170 215 [[email protected] tmp]# gawk ‘{ total=0 i=1 while (i<4) { total+=$i i++ } avg=total/3 print "Averaget:",avg }‘ data5 Averaget: 128.333 Averaget: 137.667 Averaget: 176.667 Averaget: 0
8、do-while语句
do
{
statement1
} while (condition)
[[email protected] tmp]# gawk ‘{ total=0 i=1 do { total+=$i i++ }while (total<150) print total}‘ data5 250 160 315
9、for语句
for (variable assignment;condition;interation process)
[[email protected] tmp]# gawk ‘{ > tatal=0 > for (i=1;i<4;i++) > { > total+=$i > } > avg=total/3 > print "Average:",avg > }‘ data5 Average: 128.333 Average: 266 Average: 442.667 Average: 442.667 [[email protected] tmp]# gawk ‘{for (i=1;i<4;i++) total+=$i;avg=total/3;print "Average:",avg}‘ data5 Average: 128.333 Average: 266 Average: 442.667 Average: 442.667 [[email protected] tmp]# cat data5 120 130 135 160 113 140 145 170 215
10、格式化打印
在printf命令末尾手动添加换行符来生成新行
如果需要几个单独的printf命令来在同一行上打印多个输出
[[email protected] tmp]# cat data3 data1,data2,data3,data4,data5 data6,data7,data8,data9,data10 data11,data12,data13,data14,data15 data21,data22,data23,data24,data25 [[email protected] tmp]# gawk ‘BEGIN{FS=","}{printf "%s ",$1}END{printf "\n"}‘ data3 data1 data6 data11 data21
[[email protected] tmp]# gawk ‘BEGIN{FS=","}{printf "%s ",$1}‘ data3 data1 data6 data11 data21 [[email protected] tmp]#