文本处理命令（六） / 憋错料

文本处理命令：sort，uniq，join，cut，paste，split，tr，wc

6.1.sort

功能：文本文件排序

语法：sort [OPTION]... [FILE]...

sort [OPTION]... --files0-from=F

常用选项：

-b 忽略每行前面开始的空格字符。

-c 检查文件是否已按顺序排序。

-f 忽略大小写

-M 根据月份比较排序，如:DEC FEb

-h 单位换算，也叫人性化显示

-n 数字比较排序

-o 将结果输出到文件

-t 指定分隔符

-k n,m 根据关键字排序，从第n字段开始，m字段结束

-r 倒序排序

-u 去重复行

默认是对整列排序，依照ASCII值比较

示例：

[[email protected] test]# cat seq.txt 
banana
apple
pear
orange
[[email protected] test]# sort seq.txt 
apple
banana
orange
pear
[[email protected] test]# seq 5 |shuf
3
2
4
5
1
[[email protected] test]# seq 5 |shuf|sort
1
2
3
4
5
-t -k -n选项
[[email protected] scripts]# cat oldboy.txt
48 Oct 3bc1997 lpas 68.00 lvx2a 138
484 Jan 380sdf1 usp 78.00 deiv 344
483 nov 7pl1998 usp 37.00 kvm9d 644
320 aug der9393 psh 83.00 wiel 293
231 jul sdf9dsf sdfs 99.00 werl 223
230 nov 19dfd9d abd 87.00 sdiv 230
219 sept 5ap1996 usp 65.00 lvx2c 189
216 Sept 3zl1998 usp 86.00 kvm9e 234
[[email protected] scripts]# sort -t " " -k 5 -n oldboy.txt 
483 nov 7pl1998 usp 37.00 kvm9d 644
219 sept 5ap1996 usp 65.00 lvx2c 189
48 Oct 3bc1997 lpas 68.00 lvx2a 138
484 Jan 380sdf1 usp 78.00 deiv 344
320 aug der9393 psh 83.00 wiel 293
216 Sept 3zl1998 usp 86.00 kvm9e 234
230 nov 19dfd9d abd 87.00 sdiv 230
231 jul sdf9dsf sdfs 99.00 werl 223
-r选项
[[email protected] scripts]# seq 5 |sort -r
5
4
3
2
1
-u选项
[[email protected] test]# cat seq.txt 
banana
apple
pear
orange
pear
orange
[[email protected] test]# sort -u seq.txt #去重复并排序
apple
banana
orange
pear
-0选项
[[email protected] test]# sort -r seq.txt  >> seq1.txt
[[email protected] test]# cat seq1.txt 
pear
pear
orange
orange
banana
apple
[[email protected] test]# sort -u seq.txt -o seq2.txt
[[email protected] test]# cat seq2.txt 
apple
banana
orange
pear
-n选项
sort排序有时候回到10比2小的情况，这是因为怕排序先比较1和2,显然1小，就将10放在2的前面了。
[[email protected] test]# cat number.txt 
1
10
19
11
2
5
[[email protected] test]# sort number.txt  #按字符排序
1
10
11
19
2
5
[[email protected] test]# sort -n number.txt 
1
2
5
10
11
19

6.2.uniq

功能：去除重复行

常用选项：

-c 打印出现的次数，只能统计相邻的

-d 只打印重复行

-u 只打印不重复行

-D 只打印重复行，并且把所有重复行打印出来

-f n 忽略第n个字段

-i 忽略大小写

-s n 忽略前N个字符

-w 比较不超过前N个字符

示例：

[[email protected] test]# cat seq.txt 
banana
apple
pear
orange
pear
orange
[[email protected] test]# sort seq.txt
apple
banana
orange
orange
pear
pear
[[email protected] test]# sort seq.txt|uniq
apple
banana
orange
pear
[[email protected] test]# sort seq.txt|uniq -c  #打印重复的次数
      1 apple
      1 banana
      2 orange
      2 pear
[[email protected] test]# sort seq.txt|uniq  -u #打印不重复行
apple
banana 
[[email protected] test]# sort seq.txt|uniq  -d  #打印重复行
orange
pear
[[email protected] test]# sort seq.txt|uniq  -d -c  #打印重复行并出现的次数
      2 orange
      2 pear
[[email protected] test]# sort seq.txt|uniq  -w 1  #根据字符去重
apple
banana
orange
pear

6.3.join

功能：连接两个文件

语法： join [OPTION]... FILE1 FILE2

常用选项：

-a <1或2> 除显示原来输出的内容外，还显示指定文件中没有相同的栏位，默认不显示

-i 忽略大小写

-o 按照指定文件栏位显示

-t 使用字符作为输入和输出字段分隔符

-1 连接文件1的指定栏位

-2 连接文件2的指定栏位

示例：

[[email protected] test]# cat name.txt town.txt  #一个是名字和街道地址，一个是名字和城镇
M.Golls 12 Hidd Rd
P.Heller The Acre
P.Willey 132 The Grove
T.Norms 84 Connaught Rd
K.Fletch 12 Woodlea
M.Golls Norwich NRD
P.Willey Galashiels GDD
T.Norms Brandon BSL
K.Fletch Mildenhall MAF
K.Firt Mitryl Mdt
[[email protected] test]# join name.txt town.txt  #合并两个文件
M.Golls 12 Hidd Rd Norwich NRD
P.Willey 132 The Grove Galashiels GDD
join: file 1 is not in sorted order
join: file 2 is not in sorted order
T.Norms 84 Connaught Rd Brandon BSL
K.Fletch 12 Woodlea Mildenhall MAF

6.4.cut

功能：截取命令

语法：cut OPTION... [FILE]...

常用选项：

-b 以字节为单位进行分割

-c：以字符为单位进行分割

-d：自定义分隔符，默认为制表符

-f：与-d一起使用，指定显示那个区域

-n：取消分割多字节字符，和-b一起使用

[[email protected] test]# who
root     pts/1        2017-05-26 18:36 (192.168.19.1)
root     pts/2        2017-05-27 09:32 (192.168.19.1)
[[email protected] test]# who |cut -b 3 #以字节分隔，显示第三个字节
o
o
[[email protected] test]# who |cut -c 3 #以字符分隔，显示第三个字符，每个字母即是字节，又是字符
o
o
[[email protected] test]# who |cut -d" " -f 1
root
root
[[email protected] test]# who |cut -n -b 1-5
root 
root 
[[email protected] test]# ifconfig eth1 |grep Bcast|cut -d":" -f 2 |cut -d" " -f 1 #生产示例，分割打印ip地址
192.168.19.20

6.5.paste

功能：合并文件

语法：

常用选项：

-d 指定分隔符，默认是tab键

-s 将文件内容平行，tab键分隔

示例：

[[email protected] test]# cat name.txt town.txt 
M.Golls 12 Hidd Rd
P.Heller The Acre
P.Willey 132 The Grove
T.Norms 84 Connaught Rd
K.Fletch 12 Woodlea
M.Golls Norwich NRD
P.Willey Galashiels GDD
T.Norms Brandon BSL
K.Fletch Mildenhall MAF
K.Firt Mitryl Mdt
[[email protected] test]# paste name.txt town.txt  #跟join有区别，join是相同的只显示一个，最paste全部显示
M.Golls 12 Hidd RdM.Golls Norwich NRD
P.Heller The AcreP.Willey Galashiels GDD
P.Willey 132 The GroveT.Norms Brandon BSL
T.Norms 84 Connaught RdK.Fletch Mildenhall MAF
K.Fletch 12 WoodleaK.Firt Mitryl Mdt
[[email protected] test]# paste -s name.txt town.txt  #一行显示了
M.Golls 12 Hidd RdP.Heller The AcreP.Willey 132 The GroveT.Norms 84 Connaught RdK.Fletch 12 Woodlea
M.Golls Norwich NRDP.Willey Galashiels GDDT.Norms Brandon BSLK.Fletch Mildenhall MAFK.Firt Mitryl Mdt
[[email protected] test]# paste -d"\n" name.txt town.txt  合并并换行显示
M.Golls 12 Hidd Rd
M.Golls Norwich NRD
P.Heller The Acre
P.Willey Galashiels GDD
P.Willey 132 The Grove
T.Norms Brandon BSL
T.Norms 84 Connaught Rd
K.Fletch Mildenhall MAF
K.Fletch 12 Woodlea
K.Firt Mitryl Mdt

6.6.tr

功能：替换或删除字符，可以说是sed的简化命令

格式：tr [OPTION]... SET1 [SET2]

常用选项：

-c 保留SET1的字符，其他都替换为SET2，字符为ASCII

-d 删除SET1中所有字符

-s 删除SET1中重复出现的字符

-t 将SET1用SET2转换，默认

字符范围

指定set1或set2的内容时，只能使用单字符或字符串范围或列表。

[a-z] a-z内的字符组成的字符串。

[A-Z] A-Z内的字符组成的字符串。

[0-9] 数字串。

\octal 一个三位的八进制数，对应有效的ASCII字符。

[O*n] 表示字符O重复出现指定次数n。因此[O*2]匹配OO的字符串。

tr中特定控制字符的不同表达方式

速记符含义八进制方式

\a Ctrl-G 铃声\007

\b Ctrl-H 退格符\010

\f Ctrl-L 走行换页\014

\n Ctrl-J 新行\012

\r Ctrl-M 回车\015

\t Ctrl-I tab键\011

\v Ctrl-X \030

示例：

[[email protected] test]# cat xaa |tr -c s 2  #除过s以外的所有字符串空格都替换为2了
222s22s2222222
[[email protected] test]# cat xaa |tr -d s  #删除了s字符再打印
thi i line1
[[email protected] test]# echo 111111222223333565656 |tr -s ‘[0-9]‘ #只有相邻重复的才会删除
123565656
[[email protected] test]# echo 111111222223333565656 |tr -s ‘[0-9]‘ ‘[a-z]‘  #去重复在替换
bcdfgfgfg
[[email protected] test]# echo 111111222223333565656 |tr -t ‘[0-9]‘ ‘[a-z]‘ 
bbbbbbcccccddddfgfgfg
# cat file | tr "abc" "xyz" > new_file  #abc替换为xyz
凡是在file中出现的"a"字母，都替换成"x"字母，"b"字母替换为"y"字母，"c"字母替换为"z"字母。而不是将字符串"abc"替换为字符串"xyz"。
# cat file | tr [a-z] [A-Z] > new_file #小写替换为大写
# cat file | tr [A-Z] [a-z] > new_file  #大写替换为小写
# cat file | tr [0-9] [a-j] > new_file  #数字替换为a-j
# cat file | tr -d "Snail" > new_file # 删除出现的Snail
凡是在file文件中出现的‘S‘,‘n‘,‘a‘,‘i‘,‘l‘字符都会被删除！而不是紧紧删除出现的"Snail”字符串。
# cat file | tr -d "\n\t" > new_file    #删除制表符换行符
不可见字符都得用转义字符来表示的
# cat file | tr -s [a-zA-Z] > new_file  #删除“连续着的”重复字母，只保留第一个
# cat file | tr -s "\n" > new_file  #删除换行符，也就是空行
# cat file | tr -d "\r" > new_file   #删除Windows文件“造成”的‘^M‘字符
# cat file | tr -s "\r" "\n" > new_file  #删除Windows文件“造成”的‘^M‘字符
【注意】这里-s后面是两个参数"\r"和"\n"，用后者替换前者
# cat file | tr -s "\011" "\040" > new_file  #用空格符\040替换制表符\011 
# echo $PATH | tr -s ":" "\n"   #把：替换为换行符

6.7.wc

功能：统计文件行数、字节、字符数

常用选项：

-c 打印文件字节数

-m 打印文件字符数

-l 打印多少行

示例：

[[email protected] test]# wc -l /etc/passwd
22 /etc/passwd
[[email protected] test]# wc -c /etc/passwd
973 /etc/passwd
[[email protected] test]# wc -m /etc/passwd
973 /etc/passwd

6.8.split

功能：切割文件

语法： split [OPTION]... [INPUT [PREFIX]]

常用选项：

-<行数>或-l<行数> 　指定每多少行就要切成一个小文件。

-b<字节> 　指定每多少字就要切成一个小文件。支持单位:m,k

-C<字节> 　与-b参数类似，但切割时尽量维持每行的完整性。

-d 以数字为后缀

-a length 指定数字后缀的长度

--help 　显示帮助。

示例：

[[email protected] test]# cat split1 
this is line1
this is line2
this is line3
this is line4
this is line5
this is line6
按每个文件1行分割，并按字母顺序命名文件
[[email protected] test]# split -1 split1 
[[email protected] test]# ll
total 36
-rw-r--r-- 1 root root  84 May 27 14:35 split1
-rw-r--r-- 1 root root  14 May 27 14:35 xaa
-rw-r--r-- 1 root root  14 May 27 14:35 xab
-rw-r--r-- 1 root root  14 May 27 14:35 xac
-rw-r--r-- 1 root root  14 May 27 14:35 xad
-rw-r--r-- 1 root root  14 May 27 14:35 xae
-rw-r--r-- 1 root root  14 May 27 14:35 xaf
[[email protected] test]# cat xaa
this is line1
[[email protected] test]# cat xab
this is line2

时间： 2024-11-06 21:36:40

文本处理命令（六）

文本处理命令（六）的相关文章

shell脚本学习笔记之文本处理命令

文本处理命令的使用和说明

Linux学习笔记——文本管理命令及相关选项

Linux基础文本查看命令之 cat,tac,more,less,head,tail

linux学习之路及文本查看和文本处理命令

Linux基础之文本处理命令（wc,cut,sort,uniq,diff,patch）

字符及文本处理命令

Linux基础之文本查看命令(cat,tac,rev,head,tail,more,less)

mac 文本处理命令分享