在Linux系统中有三款被称为文本处理三剑客的文本处理工具:grep,sed,awk。这三款工具在处理文本文件时都很强大,下面就先介绍一下grep和sed。
grep命令;
grep可以根据用户指定的模式或正则表达式把所文本中匹配到的行打印到屏幕上。
grep用法:
grep [OPTIONS] PATTERN [FILE...]
常用选项:
-i --ignore-case #忽略字符大小写的差别;
-o:仅显示匹配到的字符串本身;
-E --extended-regexp :支持扩展正则表达式;
-v --invert-match:显示不能被模式匹配到的行;
-q --quiet,--silent:静默模式,即不输出任何信息;
-A --after-context:显示匹配行之后的内容;
-B --before-context:显示匹配行之前的内容;
-C --context:显示匹配行前后的内容;
POSIX标准字符:
[:alnum:] #表示英文大小写及数字字符,相当于 0-9,a-z,A-Z;
[:alpha:] #表示所有英文字符,a-z,A-Z;
[:digit:] #数字字符 ,0-9;
[:upper:] #大写字符 ,A-Z;
[:lower:] #小写字符 ,a-z;
[:print:] #非空字符(包括空格)
[:punct:] #标点符号 ,即” ‘ ? ! ; : # $等
[:space:] #所有空白字符(新行,空格,制表符 TAB等)
正则表达式:
*:匹配其前面的字符任意次;0,1,多次;
.*:匹配任意长度的任意字符
\?:匹配其前面的字符0次或1次;即其前面的字符是可有可无的;
\+:匹配其前面的字符1次或多次;即其面的字符要出现至少1次;
\{m\}:匹配其前面的字符m次;
\{m,n\}:匹配其前面的字符至少m次,至多n次;
\{0,n\}:至多n次
\{m,\}:至少m次
^:行首锚定;用于模式的最左侧;
$:行尾锚定;用于模式的最右侧;
^PATTERN$(模式):用于PATTERN来匹配整行;
^$:空白行;
^[[:space:]]*$:空行或包含空白字符的行;
\< 或 \b:词首锚定,用于单词模式的左侧;
\> 或 \b:词尾锚定,用于单词模式的右侧;
\<PATTERN\>:精确匹配完整单词;
\(..\) :标记匹配字符,如‘\(txt \)‘,txt可用1代替;
示例1:查找pets.txt文件中所有包含大写字母的行
[[email protected] ~]# grep ‘[A-Z]‘ pets.txt This is my cat This is my dog This is my fish This is my goat
示例2:查找所有以空格开头的行
[[email protected] ~]# grep ‘^[[:space:]]‘pets.txt mycat‘s name is betty mydog‘s name is frank myfish‘s name is george mygoat‘s name is adam
示例3:在文件中读取关键词并显示行号
[[email protected] ~]# cat pets.txt | grep -n‘dog‘ 3:This is my dog 4: my dog‘s name is frank
示例4:找出pets.txt文件中包含betty或frank的行
[[email protected] ~]# grep -E ‘betty|frank‘pets.txt mycat‘s name is betty mydog‘s name is frank
示例5:查找lovers.txt中l e 中间包含任意两个字符的行
[[email protected] ~]# grep"\(l..e\).*\1" lovers.txt He loves his lover. She likes her liker.
sed命令:
sed (stream editor)流编辑器,用来匹配正则表达式对文本进行处理,并输出到屏幕上;而且不对原文件做改动。
具体过程如下:首先sed把当前正在处理的行保存在一个临时缓存区中(pattern space),然后处理临时缓冲区中的行,完成后把该行保存到hold space中,根据编辑命令发送到屏幕上。sed每处理完一行就将其从pattern space中删除,然后将下一行读入,进行处理和显示。处理完输入文件的最后一行后,sed便结束运行。sed把每一行都存在pattern space中,对这个副本进行编辑,所以不会修改原文件。
地址定界:
空地址表示对全文进行处理
#(数字):处理#行
/pattern/:被此模式所匹配的每一行
$:表示最后一行
~:表示每隔一行显示
例如:查看pets.txt 这个文件中的第4行
[[email protected] network-scripts]# sed -n‘4p‘ pets.txt This is my dog
查看pets.txt文件中的第1到5行
[[email protected] network-scripts]# sed -n‘1,5p‘ pets.txt This is my cat mycat‘s name is betty This is my dog mydog‘s name is frank
查看文件中的所有内容
[[email protected] network-scripts]# sed ‘‘pets.txt This is my cat mycat‘s name is betty This is my dog mydog‘s name is frank This is my fish myfish‘s name is george This is my goat mygoat‘s name is adam
常用选项:
-n:不输出模式空间中的内容至屏幕
-e script ,--expression=script:支持在同一行中执行多个命令;
-f /PATH/TO/SED_SCRIPT_FILE 每行一个编辑命令;
-r, --regexp-extended:支持使用扩展正则表达式;
-i[SUFFIX], --in-place[=SUFFIX]:直接编辑原文件 ;
Sed的编辑命令
d:删除所指定的行
work-scripts]# sed ‘2d‘pets.txt This is my cat This is my dog mydog‘s name is frank This is my fish myfish‘s name is george This is my goat mygoat‘s name is adam
p:显示模式空间中的内容;
[[email protected] network-scripts]# sed -n‘2p‘ pets.txt mycat‘s name is bett
如果不加–n 则会显示模式空间中的内容和处理结果
[[email protected] network-scripts]# sed ‘2p‘pets.txt This is my cat mycat‘s name is betty mycat‘s name is betty This is my dog mydog‘s name is frank This is my fish myfish‘s name is george This is my goat mygoat‘s name is adam
可以看出第二行则会显示两遍;
a\text:在行后面追加文本“text”,支持使用\n实现多行追加;
例如:在第二行后追加 “hello”
[[email protected] network-scripts]# sed ‘2a\hello‘ pets.txt This is my cat mycat‘s name is betty hello This is my dog mydog‘s name is frank This is my fish myfish‘s name is george This is my goat mygoat‘s name is adam
i\text:在行前面追加文本“text”,支持使用\n实现多行插入;
例如:
[[email protected] network-scripts]# sed ‘2i\hello‘ pets.txt This is my cat hello mycat‘s name is betty This is my dog mydog‘s name is frank This is my fish myfish‘s name is george This is my goat mygoat‘s name is adam
c\text:把匹配到的行替换为此处指定的文本“text”;
例如:
[[email protected] network-scripts]# sed ‘2c\hello‘ pets.txt This is my cat hello This is my dog mydog‘s name is frank This is my fish myfish‘s name is george This is my goat mygoat‘s name is adam
w/PATH/TO/SOMEFILE:保存模式空间匹配到的行至指定的文件中;
例如:把pets.txt 文件中的第一行写入到 pets.txt.new 中
[[email protected] network-scripts]# sed ‘1wpets.txt.new‘ pets.txt This is my cat mycat‘s name is betty This is my dog mydog‘s name is frank This is my fish myfish‘s name is george This is my goat mygoat‘s name is adam [[email protected] network-scripts]# cat pets.txt.new This is my cat
r/PATH/FROM/SOMEFILE :读取指定文件的内容至当前文件被模式匹配到的行后面;文件合并;
[[email protected] network-scripts]# cat pets.txt.new hello ! This is my cat [[email protected] network-scripts]# sed ‘2rpets.txt.new‘ pets.txt This is my cat mycat‘s name is betty hello ! This is my cat This is my dog mydog‘s name is frank This is my fish myfish‘s name is george This is my goat mygoat‘s name is adam
=:为模式空间匹配到的行打印行号;
[[email protected] network-scripts]# sed ‘/This/=‘ pets.txt 1 This is my cat my cat‘s name is betty 3 This is my dog my dog‘s name is frank 5 This is my fish my fish‘s name is george 7 This is my goat my goat‘s name is adam
!:表示条件取反;
[[email protected] network-scripts]# sed -n ‘/This/!p‘ pets.txt my cat‘s name is betty my dog‘s name is frank my fish‘s name is george my goat‘s name is adam
s///:查找替换,其分隔符可自行指定,常用的有[email protected]@@,s###等;
g:全局替换;
w /PATH/TO /SOMEFILE:将替换成功的结果保存至指定文件中;
p:至显示替换成功的行;
例如:全局替换T为t
[[email protected]]# sed ‘s/T/t/g‘ pets.txt this is mycat my cat‘s name is betty this is mydog my dog‘s name is frank this is myfish my fish‘s name is george this is mygoat my goat‘s name is adam
把第一行的T替换为t ,把第二行的n替换为N
[[email protected] network-scripts]# sed‘1s/T/t/; 2s/n/N/‘ pets.txt this is my cat mycat‘s Name is betty This is my dog mydog‘s name is frank This is my fish myfish‘s name is george This is my goat mygoat‘s name is adam
# sed ‘1s/T/t/; 2s/n/N/‘ pets.txt 等价于 #sed –e‘1s/T/t/’ –e ‘2s/n/N/’ pets.txt
p:仅显示替换成功的行
[[email protected] network-scripts]# sed -n -e‘1s/T/t/p‘ -e ‘2s/n/N/p‘ pets.txt this is my cat mycat‘s Name is betty
此外sed还有一些高级编辑的用法
g: 将hold space中的内容拷贝到patternspace中,原来pattern space里的内容清除
G: 将hold space中的内容append到patternspace\n后
h: 将pattern space中的内容拷贝到holdspace中,原来的hold space里的内容被清除
H: 将pattern space中的内容append到hold space\n后
x: 交换pattern space和hold space的内容
n:覆盖读取匹配到的行的下一行至patternspace中;
N:追加读取匹配到的行的下一行至patternspace中;
d:删除pattern space中的行;
D:删除多行pattern space的所有行;
先来看一个示例:
[[email protected] ~]# cat num.txt 1 2 3 [[email protected] ~]# sed ‘H;g‘ num.txt 1 1 2 1 2 3
是不是看不太懂,在解释之前先补充一个概念,hold space(称为保持空间);sed 会把文本的内容先读取到pattern space中,如果被模式所匹配则按模式编辑后保存到hold space中,也就是说行内容必须先过了pattern space这关才能到hold space中;
下面通过一副图来了解一下上面那个示例中处理内容的原理:
由图中可以看出pattern space中的内容先被追加到hold space中,然后又把hold space中的内容覆盖到pattern space中;
只要按这个原理也可以很容易学会其他选项,例如覆盖、删除、替换等操作;