linux脚本攻略 / 憋错料

#列出以a和o开头的所有文件
[[email protected] ~]# ls
anaconda-ks.cfg  nss-pam-ldapd-0.9.8-1.gf.el7.x86_64.rpm  openldap-clients-2.4.44-21.el7_6.x86_64.rpm  original-ks.cfg  tools
[[email protected] ~]# ls [ao]*
anaconda-ks.cfg  openldap-clients-2.4.44-21.el7_6.x86_64.rpm  original-ks.cfg

#[0-9]表示任意单个数字
#[!0-9]表示非数字开头的字符串
[[email protected] ~]# ls
1  3   anaconda-ks.cfg                          openldap-clients-2.4.44-21.el7_6.x86_64.rpm  tools
2  44  nss-pam-ldapd-0.9.8-1.gf.el7.x86_64.rpm  original-ks.cfg
[[email protected] ~]# ls [!0-9]*
anaconda-ks.cfg  nss-pam-ldapd-0.9.8-1.gf.el7.x86_64.rpm  openldap-clients-2.4.44-21.el7_6.x86_64.rpm  original-ks.cfg

tools:
libnss-cache  nsscache
[[email protected] ~]# ls [0-9]*
1  2  3  44

#删除以数字开头的文件
rm -f [0-9]*

#删除非以数字开头的文件
[[email protected] test]# ls
1  2  3  4  a  aa  b  bb
[[email protected] test]# rm -f [!0-9]*
[[email protected] test]# ls
1  2  3  4

echo

默认情况下，echo会将一个换行符追加到文本尾部，可以使用-n来忽略换行符

[[email protected] src]# echo abc ddd
abc ddd
[[email protected] src]# echo -n abc ddd
abc ddd[[email protected] src]#

echo -e

echo -e 处理特殊字符

若字符串中出现以下字符，则特别加以处理，而不会将它当成一般文字输出：

\a 发出警告声；
\b 删除前一个字符；
\c 最后不加上换行符号；
\f 换行但光标仍旧停留在原来的位置；
\n 换行且光标移至行首；
\r 光标移至行首，但不换行；
\t 插入tab；
\v 与\f相同；
\\ 插入\字符；
\nnn 插入nnn（八进制）所代表的ASCII字符；

下面举例说明一下：

$echo -e "a\bdddd"  //前面的a会被擦除
dddd

$echo -e "a\adddd" //输出同时会发出报警声音
adddd

$echo -e "a\ndddd" //自动换行
a
dddd

变量

字符串长度： ${#var}

[[email protected] src]# echo ${NODE_HOME}
/usr/local/node
[[email protected] src]# echo ${#NODE_HOME}
15
#长度有15个字符

使用shell进行数学计算

当使用let时，变量名之前不需要添加$

[[email protected] src]# nod1=3
[[email protected] src]# nod2=5
[[email protected] src]# abc=$[nod1+nod2]
[[email protected] src]# echo $abc
8
[[email protected] src]# let def=nod1+nod2
[[email protected] src]# echo $def
8

[[email protected] 2056]# echo "4*0.56" |bc
2.24
[[email protected] 2056]# no=54
[[email protected] 2056]# res=`echo "$no*1.5"|bc`
[[email protected] 2056]# echo $res
81.0
[[email protected] 2056]#

其他参数可以置于要执行的具体操作之前，同时以分号作为定界符，通过stdin传递给bc

例如设置小数精度

[[email protected] 2056]# echo "scale=2;3/8" | bc
.37

文件描述符

0---stdin 标准输入
1---stdout 标准输出
2---stderr 标准错误

当一个命令发生错误并退出时，她会返回一个非0的退出状态，执行成功后会返回数字0，。退出状态可以冲$?中获得， echo $?

正确输出到out.txt,错误输出到桌面
ls  > out.txt

错误输出到out.txt,正确输出到桌面
ls  2> out.txt

所有输出重定向到out.txt
ls  &> out.txt

可以联合起来
find /etc -name passwd > find.txt 2> find.err

把错误结果丢弃，只输出正确结果在屏幕
find /etc -name passwd 2> /dev/null

把所有结果丢弃
find /etc -name passwd &> /dev/null

由于错误的输出是不能经过管道的，所以如果必要，必须把错误输出当正确输出
即：find /etc -name passwd 2>&1 |less

例如find /etc -name passwd |wc -l
实际上这个统计的只有正确的行数，错误的输出没有统计

find /etc -name passwd 2>&1 |wc -l
这个则把错误的也当正确的统计出来了

/sbin/service vsftpd stop > /dev/null 2>&1
意思是停止这个服务，正确的输出丢弃，错误输出当正确输出输出到终端

数组和关联数组

定义数组的方式有多种，我们常用在单行中只用一列值来定义数组：

[[email protected] ~]# array_var=(1 2 3 4 5 6 6 6)
[[email protected] ~]# echo ${array_var[*]}  #打印数组中所有值，方式1
1 2 3 4 5 6 6 6
[[email protected] ~]# echo ${array_var[@]}  #打印数组中所有值，方式2
1 2 3 4 5 6 6 6
[[email protected] ~]# echo ${#array_var[*]} #打印数组长度
8

关联数组类似于字典，可以自定义key值，可以列出数组索引key

获取终端信息

tput sc #存储光标位置
tput rc #恢复光标
tput ed #清除光标到行尾的所有内容

脚本中生成延时

倒计时：

#!/bin/bash
echo -n Count:
tput sc

count=11;
while true;
do+
  if [ $count -gt 0 ];
  then
    let count--;
    sleep 1;
    tput rc
    tput ed
    echo -n $count;
  else exit 0;
  fi
done

#此处的栗子中，变量count初始值为11，每次循环便减少1，。tput sc存储光标位置。在每次循环中，通过恢复之前存储的光标位置，在终端中打印出新的count值。恢复光标位置的命令是tput rc。 tput ed清除从当前光标位置到行尾之间的所有内容，使得旧的count值可以被清除并写入新值。

函数和参数

定义函数

function fname()
{
statements;
}

或者：

fname()
{
statements;
}

调用，只需要使用函数名字就能调用
```
fname; #执行函数
```
参数可以传递给函数，并有脚本进行访问
```
fname arg1 arg2;
```

各种访问函数参数的方法

fname()
{
echo $1,$2; #访问参数1和参数2
echo "[email protected]"; #以列表的形式一次打印所有参数
echo "$*"; #类似于[email protected],但是参数被作为单个实体
echo "$#"; #$#表示这个脚本或者函数后面参数的个数
return 0;  #返回值
}
#[email protected]比$*用的多，因为$*把所有参数当做单个字符串，因此很少使用

函数递归
在bash中函数同样支持递归（可以调用自身的函数），例如
```
F() { echo $1; F hello; sleep 1; }
```
fork炸弹
```
:(){ :|:& };:

#这个递归函数就能不断的调用自身，生成新的进程，最终造成拒绝服务***，函数调用前的&将紫禁城放入后台。这段危险的代码会分支处大量的进程，因而成为fork炸弹

[[email protected] ~]#  :(){ :|:& };:
[1] 2526
[[email protected] ~]#
[1]+  完成  

死机了
```
这样看起来不是很好理解，我们可以更改下格式：
```
:()
{
:|:&
};
:
```
更好理解一点的话就是这样:
```
bomb()
{
bomb|bomb&
};
bomb
```
因为shell中函数可以省略function关键字，所以上面的十三个字符是功能是定义一个函数与调用这个函数，函数的名称为:,主要的核心代码是：|：&，可以看出这是一个函数本身的递归调用，通过&实现在后台开启新进程运行，通过管道实现进程呈几何形式增长，最后再通过：来调用函数引爆炸弹.因此，几秒钟系统就会因为处理不过来太多的进程而死机，解决的唯一办法就是重启

预防方式

当然，Fork炸弹没有那么可怕，用其它语言也可以分分钟写出来一个，例如，python版：

  import os
  while True:
      os.fork()

Fork炸弹的本质无非就是靠创建进程来抢占系统资源，在Linux中，我们可以通过ulimit命令来限制用户的某些行为，运行ulimit -a可以查看我们能做哪些限制：

[[email protected] ~]# ulimit -a
core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 7675
max locked memory       (kbytes, -l) 64
max memory size         (kbytes, -m) unlimited
open files                      (-n) 655350
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 8192
cpu time               (seconds, -t) unlimited
max user processes              (-u) 100
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited

可以看到，-u参数可以限制用户创建进程数，因此，我们可以使用ulimit -u 100来允许用户最多创建100个进程。这样就可以预防bomb炸弹。但这样是不彻底的，关闭终端后这个命令就失效了。我们可以通过修改/etc/security/limits.conf文件来进行更深层次的预防，在文件里添加如下行

*       soft    nproc   100
*       hard    nproc   100

读取命令返回值（状态）

$? 给出命令的返回值

返回值被成为退出状态，他可以用来分析命令是否执行成功，如果成功，退出状态为0，否则非0

将命令序列的输出读入变量

利用子shell生成一个独立的进程

子shell本来就是一个独立的进程，可以使用()操作符来定义一个子shell：

pwd;
(cd /bin; ls);
pwd;

#当命令在子shell中运行时，不会对当前的shell有任何影响，所有的改变仅限于子shell内部。例如当cd改变子shell的当前目录时，这种变化不会反映到主shell环境中

read读取

read用于从键盘或者标准输入中读取文本。以交互的形式读取来自用户的输入。
任何变成语言的输入库大都是从键盘读取输入；但只有当回车键按下的时候，才标志着输入完毕。
read提供了一种不需要回车键就能搞定这个任务的方法

读取n个字符并存入变量name

read -p "Enter input:" var
#提示读取

read -n number_of_chars name

read -n 3 var
echo $var

用特定的定界符作为输入行的结束

read -d ":" var
echo $var

#以冒号作为输入行的结束

运行命令直到执行成功

按照以下方式定义函数：

repeat() { while true;do [email protected] && return; done }

#我们创建了repeat函数，她包含一个无限循环，该循环执行以参数形式(通过[email protected]访问)传入函数的命令。如果命令执行成功，则返回，进而退出循环

一种更快的做法：

在大多数现代系统中，true是作为一个二进制文件来实现的。这就意味着没执行一次while循环，shell就不得不生成一个进程。如果不想这样，可以使用shell内检的":"命令，她总是返回为0的退出码：

repeat() { while :; do [email protected] && return; done }

#尽管可读性不高，但是肯定比上一种方法快

增加延时

屁如，你要从internet上下一个暂时不可用的文件，不过这个文件需要等一段时间就能下载。方法如下：

repeat wget -c http://abc.test.com/software.tar.gz

#如果采用这种形式，需要向服务器发送很多数据，可能对服务器产生影响，我们可以修改函数，加入一段短暂的延时

repeat() { while :; do [email protected] && return; sleep30; done }

#这使得命令每30秒运行一次

字段分隔符和迭代器

内部字段分隔符IFS是shell脚本中的一个重要概念。他是存储界定符的环境变量，是当前shell环境中使用的默认存在的认定界字符串

IFS的默认值为空白字符（换行符，制表符，或者空格）如在shell中默认以空格符作为IFS

[[email protected] ~]# data="abc eee ddd fff"
[[email protected] ~]# for item in $data; do echo ITEM: $item; done
ITEM: abc
ITEM: eee
ITEM: ddd
ITEM: fff

执行：
list1="1 2 3 3 4 4"
for line in $list1
do
echo $line;
done

输出：
1
2
3
3
4
4

执行：
for line in 1 2 3 3 4 4 #如果将in后面的用引号引起来，就会当成一个字符串
do
echo $line;
done

同样输出：
1
2
3
3
4
4

接下来我们可以将IFS改成逗号：

#没有修改IFS，此时我们默认是空格符，故将data作为单个字符串打印出来
[[email protected] ~]# data="eee,eee,111,222"
[[email protected] ~]# for item in $data1; do echo ITEM: $item; done
ITEM: eee,eee,111,222
[[email protected] ~]# oldIFS=$IFS  #此步骤是为了先备份目前的IFS为oldIFS，后面会恢复
[[email protected] ~]# IFS=,  #备份后修改IFS为逗号，再次输出则发现逗号已经成为分隔符
[[email protected] ~]# for item in $data1; do echo ITEM: $item; done
ITEM: eee
ITEM: eee
ITEM: 111
ITEM: 222
[[email protected] ~]# IFS=$oldIFS #还原IFS为原来的
[[email protected] ~]# for item in $data1; do echo ITEM: $item; done
ITEM: eee,eee,111,222

故我们再修改了IFS使用之后记得恢复到原样

for循环

for var in list;
do
  commands;
done

list可以是一个字符串，也可以是一个序列
{1..50}生成一个1-50的数字列表
{a..z}或{A..Z}或{a..h}生成字母表

for也可以采用c语言中的for循环模式

for (i=0;i<10;i++)
{
  commands; #使用变量$i
}

while循环

while condition
do
  commands;
done

until循环

她会一直循环直到给出的条件为真

x=0;
until [ $x -eq 9 ];
do
  let x++; echo $x;
done

比较与测试

程序中的流程控制是比较语句和测试语句处理的。我们可以用if if else以及逻辑运算符进行测试，用比较运算符来比较数据。除此之外，还有一个test命令也用于测试

if condition;
then
  commands;
fi

if condition;
then
  commands;
else if condition; then
  commands;
else
  commands;
fi

if和else语句可以进行嵌套，那样会变得很长，可以用逻辑运算符将他简洁点

[ condition ] && action; #如果前者为真，则执行action;
[ condition ] || action; #如果前者为假，则执行action;

算术比较

条件通常被放置在封闭的中括号内，一定注意在 [或]与操作数之间有一个空格，如果忘记了这个空格，就会报错

算术判断：

[ $var -eq 0 ] #当$var等于0时，返回真
[ $var -ne 0 ] #当$var为非0时，返回真

其他：
-gt: 大于
-lt：小于
-ge：大于或等于
-le：小于或等于

多条件测试：
[ $var1 -ne 0 -a $var2 -gt 2 ] #逻辑与 -a
[ $var1 -ne 0 -o $var2 -gt 2 ] #逻辑或 -o

文件系统相关测试

我们可以使用不同的条件标志测试不同的文件系统相关的属性：

[ -f file ] 给定的变量包含正常的文件路径或文件名，则为真
[ -x file ] 可执行，为真
[ -d file ] 是目录，为真
[ -e file ] 文件存在，则为真
[ -w file ] 可写，则为真
[ -r file ] 可读，则为真
[ -L file ] 包含的是一个符号链接，则为真

使用方法如下：

fpath="/etc/passwd"
if [ -e $fpath ];then
  echo File exists;
else
  echo Dose not exists;
fi

字符串比较

使用字符串比较时，最好用双中括号，因为有时候采用单个中括号会产生错误，所以最好避开

可以使用下面的方法测试两个字符串，看看是否相同

[[ $str1 = $str2 ]]
或者：
[[ $str1 == $str2 ]]
反之：
[[ $str1 != $str2 ]]

[[ -z $str1 ]] 为空字符串，则返回真
[[ -n $str1 ]] 为非空字符串，则返回真

注意在=前后均有一个空格，如果忘记空格，那就不是比较关系，而是赋值语句
使用逻辑&& 和 ||能够比较容易将多个条件组合起来

cat

一般写法：

逆序打印命令tac，这个和cat相反

cat file1 file2 file3 ...
这个命令将命令行参数的文件内容拼接在一起

类似的，我们可以用cat将来自输入文件的内容与标准输入拼接到一起，将stdin和另外一个文件中的数据结合起来，方法如下：

echo "111111" |cat  - /etc/passwd
上面的代码中，-被作为stdin文本的文件名

将制表符显示为^I

例如在用python编写程序时，代码缩进用制表符和空格是不同的，如果在空格的地方使用了制表符，就会发生缩进错误。仅仅在文本编辑器中很难发现这种错误

此时，我们可以用-T选项显示出制表符，标记成 ^I

[[email protected] ~]# cat bbb.sh
for line in "1 2 3 3 4 4"
do
    echo $line;
done

[[email protected] ~]# cat -T bbb.sh
for line in "1 2 3 3 4 4"
do
^Iecho $line;
done

行号 cat -n

#-n会为空白行也加上行号，如果需要跳过空白行，那么可以使用选项-b

[[email protected] ~]# cat bbb.sh
for line in "1 2 3 3 4 4"

do
    echo $line;
done
[[email protected] ~]# cat -n bbb.sh
     1  for line in "1 2 3 3 4 4"
     2
     3  do
     4      echo $line;
     5  done
[[email protected] ~]# cat -b bbb.sh
     1  for line in "1 2 3 3 4 4"

     2  do
     3      echo $line;
     4  done

find

find 命令的工作方式是，沿着文件层次结构向下遍历，匹配符合条件的文件，执行相应操作

find /etc #列出目录下所有的文件和文件夹，包括隐藏文件

find -iname 忽略大小写

#匹配一个或者多个文件时候，可以用OR条件操作,如查找/etc下所有.txt和.conf文件
find /etc  \( -name "*.txt" -o -name "*.conf" \)
find /etc  \( -name "*.txt" -o -name "*.conf" \) -print
#\(以及\)用于将-name "*.txt" -o -name "*.conf"视为一个整体

#-name用来匹配文件，-path则用来匹配文件路径，可用通配符
find  / -path "*/etc/*" -print
#打印只要路径中包含/etc/的及打印

#-regex参数，正则则更加强大。例如email地址可以常用[email protected]这种形式。所以将其一般转化为：
#[a-z0-9][email protected][a-z0-9]+.[a-z0-9]+
#符号+指明在它之前的字符类中字符可以出现一次或者多次。
find /etc -regex ".*\(\.py|\.sh\)$"
#查找以.py或.sh结尾的所有文件
#同样 -iregex也可以忽略大小写,同-iname一样

#-regex同样属于测试项。使用-regex时有一点要注意：-regex不是匹配文件名，而是匹配完整的文件名（包括路径）。例如，当前目录下有一个文件"abar9"，如果你用"ab.*9"来匹配，将查找不到任何结果，正确的方法是使用".*ab.*9"或者".*/ab.*9"来匹配。

find . -regex ".*/[0-9]*/.c" -print

否定参数

find /etc ! -name "*.conf" -print

基于目录深度的搜索

我们可以采用深度选项 -maxdepth和-mindepth来限制find命令遍历的目录深度

[[email protected] ~]# find /etc -maxdepth 1 -name "*.conf" -print
/etc/resolv.conf
/etc/dracut.conf
/etc/host.conf

[[email protected] ~]# find /etc -maxdepth 2 -name "*.conf" -print
/etc/resolv.conf
/etc/depmod.d/dist.conf

[[email protected] ~]# find /etc -mindepth 4 -name "*.conf" -print
/etc/openldap/slapd.d/openldap/ldap.conf
/etc/openldap/slapd.d/openldap/schema/schema_convert.conf
/etc/openldap/slapd.d/openldap/slapd.conf

基于时间的进行搜索

-atime：最近一次访问时间
-mtime：最近一次修改时间
-ctime：文件元数据（例如权限或者所有权）最后一次改变时间
上面都是以天为单位
也有以分钟为单位:
-amin
-mmin
-cmin

-newer,参考文件，比较时间戳。比参考文件更新的文件
[[email protected] ~]# find /etc -type f -newer /etc/passwd -print
/etc/resolv.conf
/etc/shadow
/etc/ld.so.cache
/etc/cni/net.d/calico-kubeconfig

基于文件大小的搜索

find /etc -type f -size +2k  #大于2k
find /etc -type f -size -2k  #小于2k
find /etc -type f -size 2k  #等于2k

删除匹配的文件

find ./ -type f -name "*.txt" -delete

基于文件和所有权

find /etc -type f -perm 644

find /etc -type f -name "*.conf"  ! -perm 644

基于用户进行搜索

find /etc -type f -user USER

执行命令或者动作

find /etc -type f -user root -exec chown mysql {} \;
#将所有所有人是root的文件所有人改成mysql

# {}是一个和-exec选项搭配使用的特殊字符串。对于每一个匹配文件， {}会被替换成相应的文件名。

另外一个例子就是将给定目录中的所有文件内容拼接起来写入单个文件，我们可以用find找到所有.conf文件，然后结合-exec使用cat命令：

find /etc/ -type f -name "*.conf" -exec cat {} \;>all.txt
#即将所有.conf文件的内容全部追加写入all.txt文件里
#没有用>>追加的原因是因为find命令全部输出就只有一个数据流(stdin)，而只有当多个数据流被追加到单个文件时才有必要使用

#下面命令将10天前的.txt文件复制到OLD目录：
find /etc -type f -mtime +10 -name "*.txt" -exec cp {} OLD \;

让find跳过一些目录

有时候为了提高性能，需要跳过一些目录，例如git，每个子目录中都会包含一个.git目录，要跳过这些目。

find /etc \( -name ".git" -prune \) -o \( -type f -print  \)

#\( -name "/etc/rabbitmq" -prune \)的作用是用于排除，而\( -type f -print  \)指明需要执行的动作。

玩转xargs

xargs命令把从stdin接收到的数据重新格式化，再将其作为参数提供给其他命令

xargs作为一种替代，其作用类似于find命令中的-exec

将多行输入转换成单行输出，只要将换行符移除，再用空格进行替代，就可以实现多行输入转换。利用xargs，我们可以用空格替换掉换行符，这样就能够将多行转换成单行

[[email protected] ~]# cat 123.txt
1 2 3 4 5
6 7 8 9
10 11 12 13 14

[[email protected] ~]# cat 123.txt |xargs
1 2 3 4 5 6 7 8 9 10 11 12 13 14

将单行输入转换成多行输出，指定每行最大的参数数量n，我们可以将任何来自stdin的文本划分为多行，每行n个参数。每个参数有空格隔开的字符串。空格是默认的界定符。

[[email protected] ~]# cat 123.txt
1 2 3 4 5
6 7 8 9
10 11 12 13 14

[[email protected] ~]# cat 123.txt |xargs -n 3
1 2 3
4 5 6
7 8 9
10 11 12
13 14

[[email protected] ~]# echo  1 3 4 5 6 7 8 |xargs -n 3
1 3 4
5 6 7
8

自定义界定符来分割参数。用-d选项为输入指定一个定制的界定符

[[email protected] ~]# echo "abcTdslfjTdshfsT1111Tfd222" |xargs -d T
abc dslfj dshfs 1111 fd222
#以字母T作为分隔符

#我们可以自定义分解符的同时每行定义输出多少个参数
[[email protected] ~]# echo "abcTdslfjTdshfsT1111Tfd222" |xargs -d T -n 2
abc dslfj
dshfs 1111
fd222

#每行输出一个参数
[[email protected] ~]# echo "abcTdslfjTdshfsT1111Tfd222" |xargs -d T -n 1
abc
dslfj
dshfs
1111
fd222

子shell

cmd0 | (cmd1;cmd2;cmd3) | cmd4

中间是子shell，里面如果有cmd，只在子shell内生效

print和print0的区别

-print 在每一个输出后会添加一个回车换行符，而-print0则不会。
[[email protected] shell_test]# find /home/AaronWong/ABC/ -type f -print
/home/AaronWong/ABC/libcvaux.so
/home/AaronWong/ABC/libgomp.so.1
/home/AaronWong/ABC/libcvaux.so.4
/home/AaronWong/ABC/libcv.so
/home/AaronWong/ABC/libhighgui.so.4
/home/AaronWong/ABC/libcxcore.so
/home/AaronWong/ABC/libhighgui.so
/home/AaronWong/ABC/libcxcore.so.4
/home/AaronWong/ABC/libcv.so.4
/home/AaronWong/ABC/libgomp.so
/home/AaronWong/ABC/libz.so
/home/AaronWong/ABC/libz.so.1
[[email protected] shell_test]# find /home/AaronWong/ABC/ -type f -print0
/home/AaronWong/ABC/libcvaux.so/home/AaronWong/ABC/libgomp.so.1/home/AaronWong/ABC/libcvaux.so.4/home/AaronWong/ABC/libcv.so/home/AaronWong/ABC/libhighgui.so.4/home/AaronWong/ABC/libcxcore.so/home/AaronWong/ABC/libhighgui.so/home/AaronWong/ABC/libcxcore.so.4/home/AaronWong/ABC/libcv.so.4/home/AaronWong/ABC/libgomp.so/home/AaronWong/ABC/libz.so/home/AaronWong/ABC/libz.so.1

tr

tr只能通过stdin标准输入，而无法通过命令行参数来接收输入。他的调用格式为：
tr [option] set1 set2

制表符转换成空格： tr ‘\t‘ ‘ ‘ < file.txt

[[email protected] ~]# cat -T 123.txt
1 2 3 4 5
6 7 8 9
^I10 11 12 13 14

[[email protected] ~]# tr ‘\t‘ ‘    ‘ < 123.txt
1 2 3 4 5
6 7 8 9
 10 11 12 13 14

用tr删除字符

tr有一个选项-d,可以通过指定需要被删除的字符集合，将出现在stdin中的特定字符清除掉：

cat  file.txt |tr -d ‘[set1]‘
#只使用set1不使用set2

#替换数字
[[email protected] ~]# echo "Hello 123 world 456" |tr -d ‘0-9‘
Hello  world 

#替换字母
[[email protected] ~]# echo "Hello 123 world 456" |tr -d ‘A-Za-z‘
 123  456

#替换H
[[email protected] ~]# echo "Hello 123 world 456" |tr -d ‘H‘
ello 123 world 456

排序，唯一与重复

sort能帮助我们对文本文件和stdin进行排序操作。他通常配合其他命令来生成所需要的输出。uniq是一个经常与sort一同使用的命令。他的作用是从文本或stdin中提取唯一的行。

#我们可以按照下面的方式轻松的对一组文件（例如file1.txt file2.txt）进行排序：
[[email protected] ~]# sort /etc/passwd /etc/group
adm:x:3:4:adm:/var/adm:/sbin/nologin
adm:x:4:
apache:x:48:
apache:x:48:48:Apache:/usr/share/httpd:/sbin/nologin
audio:x:63:
bin:x:1:
bin:x:1:1:bin:/bin:/sbin/nologin
caddy:x:996:
caddy:x:997:996:Caddy web server:/var/lib/caddy:/sbin/nologin
...

#也可合并排序后重定向到新文件
sort /etc/passwd /etc/group > abc.txt

#按照数字排序
sort -n

#逆序排序
sort -r

#按照月份排序
sort -M month.txt

#合并两个已经排序了的文件
sort -m sorted1 sorted2

#找出已排序文件中不重复的行
sort file1.txt file2.txt |uniq

检查文件是否已经排序过：

要检查文件是否排过序，可以采用以下方法，如果已经排序，sort会返回0的退出码（$?），否则返回非0

#!/bin/bash
sort -C filename;
if [ $? -eq 0 ]; then
  echo Sorted;
else
  echo Unsorted;
fi

sort命令包含大量选项。如果使用uniq，那sort更加必不可少，因为需求输入数据必须经过排序

sort完成一些较复杂的任务

#-k指定了按照哪一列进行排序，-r逆序，-n按照数字
sort -nrk 1 data.txt
sort -k 2 data.txt

uniq

uniq只能作用于排过序的数据输入

[[email protected] ~]# cat data.txt
1010hellothis
3333
  2189ababbba
333
 7464dfddfdfd
333

#去重
[[email protected] ~]# sort data.txt |uniq
1010hellothis
  2189ababbba
333
3333
 7464dfddfdfd

#去重并统计
[[email protected] ~]# sort data.txt |uniq -c
      1 1010hellothis
      1   2189ababbba
      2 333
      1 3333
      1  7464dfddfdfd

#只显示文本中没有重复的行
[[email protected] ~]# sort data.txt |uniq -u
1010hellothis
  2189ababbba
3333
 7464dfddfdfd

#只显示文本中重复了的行
[[email protected] ~]# sort data.txt |uniq -d
333

临时文件命名和随机数

编写shell脚本时，我们经常需要存储临时数据。最适合存储临时数据的位置是/tmp（该目录中的内容在系统重启后会被清空）。有两种方法可以为临时数据生成标准的文件名

[[email protected] ~]# file1=`mktemp`
[[email protected] ~]# echo $file1
/tmp/tmp.P9var0Jjdw
[[email protected] ~]# cd /tmp/
[[email protected] tmp]# ls
add_user_ldapsync.ldif     create_module_config.ldif.bak   globalconfig.ldif       overlay.ldif
create_module_config.ldif  databaseconfig_nosyncrepl.ldif  initial_structure.ldif  tmp.P9var0Jjdw
#上面的代码创建了一个临时文件，并且打印出文件名

[[email protected] tmp]# dir1=`mktemp -d`
[[email protected] tmp]# echo $dir1
/tmp/tmp.UqEfHa389N
[[email protected] tmp]# ll
总用量 28
-r--------. 1 root root  130 2月  12 2019 add_user_ldapsync.ldif
-r--------. 1 root root  329 2月  14 2019 create_module_config.ldif
-r--------. 1 root root  329 2月  12 2019 create_module_config.ldif.bak
-r--------. 1 root root 2458 2月  14 2019 databaseconfig_nosyncrepl.ldif
-r--------. 1 root root  239 2月  12 2019 globalconfig.ldif
-r--------. 1 root root  795 2月  12 2019 initial_structure.ldif
-r--------. 1 root root  143 2月  12 2019 overlay.ldif
-rw-------  1 root root    0 9月  27 13:06 tmp.P9var0Jjdw
drwx------  2 root root    6 9月  27 13:09 tmp.UqEfHa389N
#以上代码创建了一个临时目录，并打印目录名

[[email protected] tmp]# mktemp test1.XXX
test1.mBX
[[email protected] tmp]# mktemp test1.XXX
test1.wj1
[[email protected] tmp]# ls
总用量 28
-r--------. 1 root root  130 2月  12 2019 add_user_ldapsync.ldif
-r--------. 1 root root  329 2月  14 2019 create_module_config.ldif
-r--------. 1 root root  329 2月  12 2019 create_module_config.ldif.bak
-r--------. 1 root root 2458 2月  14 2019 databaseconfig_nosyncrepl.ldif
-r--------. 1 root root  239 2月  12 2019 globalconfig.ldif
-r--------. 1 root root  795 2月  12 2019 initial_structure.ldif
-r--------. 1 root root  143 2月  12 2019 overlay.ldif
-rw-------  1 root root    0 9月  27 13:12 test1.mBX
-rw-------  1 root root    0 9月  27 13:12 test1.wj1
-rw-------  1 root root    0 9月  27 13:06 tmp.P9var0Jjdw
drwx------  2 root root    6 9月  27 13:09 tmp.UqEfHa389N
#以上是根据模板名创建临时文件，XXX为大写，X会被随机的字符字母或者数字替换，注意mktemp正常工作的前提是保证模板中至少有3个X

分割文件和数据split

假设一个data.txt的测试文件，大小为100kb，你可以将他分割为多个大小为10kb的文件

[[email protected] src]# ls
nginx-1.14.2  nginx-1.14.2.tar.gz
[[email protected] src]# du -sh nginx-1.14.2.tar.gz
992K    nginx-1.14.2.tar.gz
[[email protected] src]# split -b 100k nginx-1.14.2.tar.gz
[[email protected] src]# ll
×üó?á? 1984
drwxr-xr-x 9 postgres mysql     186 8??  15 19:50 nginx-1.14.2
-rw-r--r-- 1 root     root  1015384 8??  16 10:44 nginx-1.14.2.tar.gz
-rw-r--r-- 1 root     root   102400 9??  29 12:36 xaa
-rw-r--r-- 1 root     root   102400 9??  29 12:36 xab
-rw-r--r-- 1 root     root   102400 9??  29 12:36 xac
-rw-r--r-- 1 root     root   102400 9??  29 12:36 xad
-rw-r--r-- 1 root     root   102400 9??  29 12:36 xae
-rw-r--r-- 1 root     root   102400 9??  29 12:36 xaf
-rw-r--r-- 1 root     root   102400 9??  29 12:36 xag
-rw-r--r-- 1 root     root   102400 9??  29 12:36 xah
-rw-r--r-- 1 root     root   102400 9??  29 12:36 xai
-rw-r--r-- 1 root     root    93784 9??  29 12:36 xaj
[[email protected] src]# ls
nginx-1.14.2  nginx-1.14.2.tar.gz  xaa  xab  xac  xad  xae  xaf  xag  xah  xai  xaj
[[email protected] src]# du -sh *
32M nginx-1.14.2
992K    nginx-1.14.2.tar.gz
100K    xaa
100K    xab
100K    xac
100K    xad
100K    xae
100K    xaf
100K    xag
100K    xah
100K    xai
92K xaj
#如上，将992K的nginx tar包分成了100k一个，最后不足100k只有92k

由上面的可以看出来，默认是以字母为后缀。如果想以数字为后缀，可以使用-d参数,-a length指定后缀长度

[[email protected] src]# ls
nginx-1.14.2  nginx-1.14.2.tar.gz
[[email protected] src]# split -b 100k nginx-1.14.2.tar.gz -d -a 5
[[email protected] src]# ls
nginx-1.14.2  nginx-1.14.2.tar.gz  x00000  x00001  x00002  x00003  x00004  x00005  x00006  x00007  x00008  x00009
[[email protected] src]# du -sh *
32M nginx-1.14.2
992K    nginx-1.14.2.tar.gz
100K    x00000
100K    x00001
100K    x00002
100K    x00003
100K    x00004
100K    x00005
100K    x00006
100K    x00007
100K    x00008
92K x00009
#文件名为x后缀为5位数数字

指定文件名前缀

之前分割的文件，文件都有一个文件名x，我们也可以通过前缀名来使用自己的文件前缀。split命令最后一个参数是PREFIX

[[email protected] src]# ls
nginx-1.14.2  nginx-1.14.2.tar.gz
[[email protected] src]# split -b 100k nginx-1.14.2.tar.gz -d -a 4 nginxfuck
[[email protected] src]# ls
nginx-1.14.2         nginxfuck0000  nginxfuck0002  nginxfuck0004  nginxfuck0006  nginxfuck0008
nginx-1.14.2.tar.gz  nginxfuck0001  nginxfuck0003  nginxfuck0005  nginxfuck0007  nginxfuck0009
#如上，最后一个参数指定了前缀

如果不想根据大小来分割，我们可以根据行数来分割-l

[[email protected] test]# ls
data.txt
[[email protected] test]# wc -l data.txt
7474 data.txt
[[email protected] test]# split -l 1000 data.txt -d -a 4 conf
[[email protected] test]# ls
conf0000  conf0001  conf0002  conf0003  conf0004  conf0005  conf0006  conf0007  data.txt
[[email protected] test]# du -sh *
40K conf0000
48K conf0001
48K conf0002
36K conf0003
36K conf0004
36K conf0005
36K conf0006
20K conf0007
288K    data.txt
#以上将一个7000行的文件分成1000行一份，文件名以conf开头，后接4位数字

文件分割csplit

csplit能依据指定的条件和字符串匹配选项对日志文件进行分割，是split工具的一个变体

split只能根据数据的大小和行数进行分割，而csplit可以根据文件自身的特点进行分割。是否存在某个单词或文本内容都可以作为分割文件的条件

[[email protected] test]# ls
data.txt
[[email protected] test]# cat data.txt
SERVER-1
[conection] 192.168.0.1 success
[conection] 192.168.0.2 failed
[conection] 192.168.0.3 success
[conection] 192.168.0.4 success
SERVER-2
[conection] 192.168.0.5 success
[conection] 192.168.0.5 failed
[conection] 192.168.0.5 success
[conection] 192.168.0.5 success
SERVER-3
[conection] 192.168.0.6 success
[conection] 192.168.0.7 failed
[conection] 192.168.0.8 success
[conection] 192.168.0.9 success
[[email protected] test]# csplit data.txt /SERVER/ -n 2 -s {*} -f server -b "%02d.log";rm server00.log
rm：是否删除普通空文件 "server00.log"？y
[[email protected] test]# ls
data.txt  server01.log  server02.log  server03.log

详细说明：

/SERVER/ 用来匹配行，分割过程即从此处开始
/[REGEX]/ 表示文本样式。包括从当前行（第一行）知道（但不包括）包含“SERVER”的匹配行
{*} 表示根据匹配行重复执行分割，直到文件末尾位置。可以用{整数}的形式来指定分割的次数
-s 使命令进入静默模式，不打印其他消息。
-n指定分割后的文件前缀
-b指定后缀格式，例如%02d.log，类似于C语言中的printf.

因为分割后的第一个文件没有任何内容（匹配的单词就位于文件的第一行），所以我们删除server00.log

根据扩展名切分文件名

有一些脚本时依据文件名进行各种处理的，我们可能需要在保留扩展名的同时修改文件名，转换文件格式（保留文件名的同时修改扩展名）或提取部分文件名。shell所具有的一些内建功能可以依据不同的情况来切分文件名

借助%符号可以轻松将名称部分从"名称.扩展名"这种格式中提取出来。

[[email protected] ~]# file_jpg="test.jpg"
[[email protected] ~]# name=${file_jpg%.*}
[[email protected] ~]# echo $name
test
#即提取了文件名部分

借助#符号则可以将文件名的扩展名部分提取出来。

[[email protected] ~]# file_jpg="test.jpg"
[[email protected] ~]# exten=${file_jpg#*.}
[[email protected] ~]# echo $exten
jpg
#提取扩展名，上面提取文件名部分是.* 此处提取扩展名为*.

以上语法释义

${VAR%.*}含义：

从$VAR中删除位于%右侧的通配符所匹配的字符串，通配符从右向左匹配
给VAR赋值，VAR=test.jpg 那么通配符从右向左就会匹配到.jpg。因此，从$VAR中删除匹配结果，就会得到test

%属于非贪婪（non-greedy）操作，他从右到左找出匹配通配符的最短结果。还有另一个操作符%%，这个操作符与%相似，但行为模式却是贪婪的，这意味着她会匹配符合条件的最长的字符串，例如VAR=hack.fun.book.txt

使用%操作符：
[[email protected] ~]# VAR=hack.fun.book.txt
[[email protected] ~]# echo ${VAR%.*}
hack.fun.book

使用%%操作符：
[[email protected] ~]# echo ${VAR%%.*}
hack

同样，对于#操作符也有##

使用#操作符：
[[email protected] ~]# echo ${VAR#*.}
fun.book.txt

使用##操作符
[[email protected] ~]# echo ${VAR##*.}
txt

批量重命名和移动

综合运用find，rename，mv我们能做到很多事情

用特定的格式重命名当前目录下的图像文件，最简单的方法就是运用以下的脚本

#!/bin/bash
count=1;
for img in `find . -iname ‘*.png‘ -o -iname ‘*.jpg‘ -type f -maxdepth 1`
do
  new=image-$count.${img##*.}
  echo "Rename $img to $new"
  mv $img $new
  let count++
done

执行上面脚本

[[email protected] ~]# ll
总用量 24
-rw-r--r--  1 root root    0 10月  8 14:22 aaaaaa.jpg
-rw-r--r--  1 root root  190 8月   9 13:51 aaa.sh
-rw-r--r--  1 root root 2168 9月  24 10:15 abc.txt
-rw-r--r--  1 root root 3352 9月  20 09:58 all.txt
-rw-------. 1 root root 1228 1月   8 2019 anaconda-ks.cfg
-rw-r--r--  1 root root    0 10月  8 14:22 bbbb.jpg
-rw-r--r--  1 root root   48 9月  18 10:27 bbb.sh
-rw-r--r--  1 root root    0 10月  8 14:22 cccc.png
drwxr-xr-x  2 root root  333 4月  11 19:21 conf
-rw-r--r--  1 root root    0 10月  8 14:22 dddd.png
-rw-r--r--  1 root root  190 10月  8 14:22 rename.sh
[[email protected] ~]# sh rename.sh
find: 警告: 您在非选项参数 -iname 后定义了 -maxdepth 选项，但选项不是位置选项 (-maxdepth 影响在它之前或之后的指定的比较测试)。请在其它参数之前指定选项。

Rename ./aaaaaa.jpg to image-1.jpg
Rename ./bbbb.jpg to image-2.jpg
Rename ./cccc.png to image-3.png
Rename ./dddd.png to image-4.png
[[email protected] ~]# ls
aaa.sh  abc.txt  all.txt  anaconda-ks.cfg  bbb.sh  conf  image-1.jpg  image-2.jpg  image-3.png  image-4.png  rename.sh

交互输入自动化

先写一个读取交互式输入的脚本

#!/bin/bash
#文件名： test.sh
read -p "Enter number:" no
read -p "Enter name:" name
echo $no,$name

按照下面的方法自动向脚本发送输入：

[[email protected] ~]# ./test.sh
Enter number:2
Enter name:rong
2,rong
[[email protected] ~]# echo -e "2\nrong\n" |./test.sh
2,rong

# \n代表着回车，我们用echo -e来生成输入序列，-e表明echo会解释转义序列。如果输入内容较多，那么可以单独的输入文件结合重定向操作符来提供输入，如下：
[[email protected] ~]# echo -e "2\nrong\n" > input.data
[[email protected] ~]# cat input.data
2
rong

[[email protected] ~]# ./test.sh < input.data
2,rong

#这个方法是从文件中导入交互式输入数据

如果你是逆向工程师，那可能同缓冲区溢出打过交道。要实施，我们需要将十六进制形式的shellcode（例如"\xeb\x1a\x5e\x31\xc0\x88\x46"）进行重定向。这些字符没法直接通过键盘输入，因为键盘上并没有对应的按键。因此我们应该使用：

echo -e "\xeb\x1a\x5e\x31\xc0\x88\x46"

用这条命令将shellcode重定向到有缺陷的可执行文件中，为了处理动态输入并通过检查程序运行时的输入需求内容来提供输入内容，我们要使用一个出色的工具expect。

expect命令可以根据输入要求提供合适的输入

用expect实现自动化

在默认的linux发行版中，多数不包含expect，你得自行安装：yum -y install expect

#!/usr/bin/expect
# 文件名expect.sh
spawn ./test.sh
expect "Enter number:"
send "2\n"
expect "Enter name:"
send "rong\n"
expect eof

#执行
[[email protected] ~]# ./expect.sh
spawn ./test.sh
Enter number:2
Enter name:rong
2,rong

spawn参数指定需要执行哪个命令或者脚本
expect参数提供需要等待的消息
send是要发送的消息
expect eof指明命令交互结束

利用并行进程加速命令执行

拿md5sum命令为例。由于涉及运算，该命令属于cpu密集型命令。如果多个文件需要生成校验和，我们可以使用下面的脚本来运行。

#!/bin/bash
PIDARRAY=()
for file in `find /etc/ -name "*.conf"`
do
  md5sum $file &
  PIDARRAY+=("$!")
done
wait ${PIDARRAY[@]}

执行：
[[email protected] ~]# sh expect.sh
72688131394bcce818f818e2bae98846  /etc/modprobe.d/tuned.conf
77304062b81bc20cffce814ff6bf8ed5  /etc/modprobe.d/firewalld-sysctls.conf
649f5bf7c0c766969e40b54949a06866  /etc/dracut.conf
d0f5f705846350b43033834f51c9135c  /etc/prelink.conf.d/nss-softokn-prelink.conf
0335aabf8106f29f6857d74c98697542  /etc/prelink.conf.d/fipscheck.conf
0b501d6d547fa5bb989b9cb877fee8cb  /etc/modprobe.d/dccp-blacklist.conf
d779db0cc6135e09b4d146ca69d39c2b  /etc/rsyslog.d/listen.conf
4eaff8c463f8c4b6d68d7a7237ba862c  /etc/resolv.conf
321ec6fd36bce09ed68b854270b9136c  /etc/prelink.conf.d/grub2.conf
3a6a059e04b951923f6d83b7ed327e0e  /etc/depmod.d/dist.conf
7cb6c9cab8ec511882e0e05fceb87e45  /etc/systemd/bootchart.conf
2ad769b57d77224f7a460141e3f94258  /etc/systemd/coredump.conf
f55c94d000b5d62b5f06d38852977dd1  /etc/dbus-1/system.d/org.freedesktop.hostname1.conf
7e2c094c5009f9ec2748dce92f2209bd  /etc/dbus-1/system.d/org.freedesktop.import1.conf
5893ab03e7e96aa3759baceb4dd04190  /etc/dbus-1/system.d/org.freedesktop.locale1.conf
f0c4b315298d5d687e04183ca2e36079  /etc/dbus-1/system.d/org.freedesktop.login1.conf
···

#由于是多个md5sum命令同时运行的，如果你使用的是多核处理器，就会更快的活的运行结果

工作原理：

利用bash的操作符&,它使得shell将命令放置于后台并继续执行脚本。这意味着一旦循环结束，脚本就会退出，而md5sum命令仍然在后台运行。为了避免这种情况，我们使用$!来获取进程pid，在bash中$!保存这最近一个后台进程的pid，我们将这些pid放入数组，然后用wait命令等待这些进程结束。

文本文件的交集和差集

comm命令可以用于两个文件之间的比较

交集：打印出两个文件共有的行
求差：打印出指定文件所包含的且互不相同的行
差集：打印出包含在文件a中，但不包含在其他文件中的行

需要注意的是，comm必须使用排过序的文件作为输出

[[email protected] ~]# cat a.txt
apple
orange
gold
silver
steel
iron
[[email protected] ~]# cat b.txt
orange
gold
cookies
carrot
[[email protected] ~]# sort a.txt -o A.txt
[[email protected] ~]# vim A.txt
[[email protected] ~]# sort b.txt -o B.txt
[[email protected] ~]# comm A.txt B.txt
apple
      carrot
      cookies
              gold
iron
              orange
silver
steel
#可以看出结果是3列，第一列输出只在A.txt中存在的行，第二列输出只在B.txt中出现的行，第三列包含A.txt和B.txt中都存在的行，各列以制表符(\t)作为界定符

#为了打赢交集，我们需要删除第一列和第二列，只显示第三列
[[email protected] ~]# comm A.txt B.txt -1 -2
gold
orange

#只打印不同
[[email protected] ~]# comm A.txt B.txt -3
apple
      carrot
      cookies
iron
silver
steel

#为了是结果可读性强，去掉前面的\t制表符
[[email protected] ~]# comm A.txt B.txt -3 |sed ‘s/^\t//‘
apple
carrot
cookies
iron
silver
steel

创建不可修改的文件

使文件设置为不可修改 chattr +i file

[[email protected] ~]# chattr +i passwd
[[email protected] ~]# rm -rf passwd
rm: 无法删除"passwd": 不允许的操作
[[email protected] ~]# chattr -i passwd
[[email protected] ~]# rm -rf passwd
[[email protected] ~]#

grep

grep可以对多个文件进行搜索

[[email protected] ~]# grep root /etc/passwd /etc/group
/etc/passwd:root:x:0:0:root:/root:/bin/bash
/etc/passwd:operator:x:11:0:operator:/root:/sbin/nologin
/etc/passwd:dockerroot:x:996:994:Docker User:/var/lib/docker:/sbin/nologin
/etc/group:root:x:0:
/etc/group:dockerroot:x:994:

grep命令只解释match_text中的某些特殊字符。如果要使用正则表达式，需要添加 -E选项。这意味着使用扩展正则表达式。或者也可以使用默认允许正则表达式的egrep命令（经过实测不加-E也可以）

#统计文本中包含匹配字符串的行数
[[email protected] ~]# grep -c root /etc/passwd
3

#打印行号
[[email protected] ~]# grep -n root /etc/passwd
1:root:x:0:0:root:/root:/bin/bash
10:operator:x:11:0:operator:/root:/sbin/nologin
27:dockerroot:x:996:994:Docker User:/var/lib/docker:/sbin/nologin

#搜索多个文件并找出匹配文本位于哪一个文件中-l
[[email protected] ~]# grep root /etc/passwd /etc/group
/etc/passwd:root:x:0:0:root:/root:/bin/bash
/etc/passwd:operator:x:11:0:operator:/root:/sbin/nologin
/etc/passwd:dockerroot:x:996:994:Docker User:/var/lib/docker:/sbin/nologin
/etc/group:root:x:0:
/etc/group:dockerroot:x:994:

[[email protected] ~]# grep root /etc/passwd /etc/group -l
/etc/passwd
/etc/group

#-L则正好相反 ，会列出不匹配的文件名

#忽略大小写 -i
#多个样式匹配-e
grep -e "pattern1" -e "pattern2"  #匹配包含模式1或者模式2的

[[email protected] ~]# grep -e root -e docker /etc/passwd /etc/group
/etc/passwd:root:x:0:0:root:/root:/bin/bash
/etc/passwd:operator:x:11:0:operator:/root:/sbin/nologin
/etc/passwd:dockerroot:x:996:994:Docker User:/var/lib/docker:/sbin/nologin
/etc/group:root:x:0:
/etc/group:dockerroot:x:994:
/etc/group:docker:x:992:

#还有另外一种方法也可以指定多个样式，我们可以提供一个样式条件用于读取样式。用-f指定文件，注意pat.file文件中不要包含末尾的空白行等
[[email protected] ~]# cat pat.file
root
docker
[[email protected] ~]# grep -f pat.file /etc/passwd /etc/group
/etc/passwd:root:x:0:0:root:/root:/bin/bash
/etc/passwd:operator:x:11:0:operator:/root:/sbin/nologin
/etc/passwd:dockerroot:x:996:994:Docker User:/var/lib/docker:/sbin/nologin
/etc/group:root:x:0:
/etc/group:dockerroot:x:994:
/etc/group:docker:x:992:

grep搜索中指定或者排除某些文件

grep可以在搜索中指定（include）或者排除（exclude）某些文件。我们通过通配符来指定所include文件或者exclude文件

#目录中递归搜索所有的.c 和.cpp文件
grep root . -r --include *.{c,cpp}

[[email protected] ~]# grep root /etc/ -r -l --include *.conf  # 此处的-l指仅列出文件名
/etc/systemd/logind.conf
/etc/dbus-1/system.d/org.freedesktop.hostname1.conf
/etc/dbus-1/system.d/org.freedesktop.import1.conf
/etc/dbus-1/system.d/org.freedesktop.locale1.conf
/etc/dbus-1/system.d/org.freedesktop.login1.conf
/etc/dbus-1/system.d/org.freedesktop.machine1.conf
/etc/dbus-1/system.d/org.freedesktop.systemd1.conf
/etc/dbus-1/system.d/org.freedesktop.timedate1.conf
/etc/dbus-1/system.d/wpa_supplicant.conf

#在搜索中排除所有的README文件
grep root . -r --exclude "README"
#******如果要排除目录，用--exclude-dir,如果要从文件中读取排除文件列表，使用--exclude-from FILE*****#

cut (略)

sed

#移除空白行
sed ‘/^$/d‘ file  # /pattern/d会移除匹配的样式的行

#直接在文件中进行替换,使用指定的数字替换文件中所有的3位数的数字
[[email protected] ~]# cat sed.data
11 abc 111 this 9 file contains 111 11 888 numbers 0000

[[email protected] ~]# sed -i ‘s/\b[0-9]\{3\}\b/NUMBER/g‘ sed.data
[[email protected] ~]# cat sed.data
11 abc NUMBER this 9 file contains NUMBER 11 NUMBER numbers 0000
#上面的命令替换了所有的3位数字。正则表达式\b[0-9]\{3\}\b用于匹配3位数字，[0-9]表示数位取值范围，也就是从0-9
# {3}表示匹配之前的字符3次。其中的\用于转义
# \b表示单词边界

sed -i .bak ‘s/abc/def/‘ file
#此时sed不仅执行文件内容替换，还会创建一个名为file.bak的文件，其中包含着原始文件内容的副本

已匹配字符串标志&

在sed中，我们可以用&标记匹配样式的字符串，这样就能够在替换字符串时使用已匹配的内容

[[email protected] ~]# echo this is my sister |sed ‘s/\w\+/<&>/g‘ #将所有的单词替换成带尖括号的单词
<this> <is> <my> <sister>
[[email protected] ~]# echo this is my sister |sed ‘s/\w\+/[&]/g‘ #将所有的单词替换成带方括号的单词
[this] [is] [my] [sister]

#正则表达式\w\+匹配每一个单词，然后我们用[&]替换它，&对应于之前匹配到的单词

引用

sed表达式通常用单引号来引用。不过也可以用双引号，我们想在sed表达式中使用一些变量时，双引号就派上了用场

[[email protected] ~]# text=hello
[[email protected] ~]# echo hello world |sed "s/$text/HELLO/"
HELLO world

awk

特殊变量:

NR:表示记录数量，在执行过程中对应于当前行号
NF:表示字段数量，执行过程中对应于当前的字段数
$0:执行过程中当前行的文本内容

使用原则：

确保整个awk命令用单引号括起来
确保命令内所有引号成对出现
确保用花括号括起来动作语句，用圆括号扩起条件语句

awk -F: ‘{print NR}‘ /etc/passwd #打印每一行的行号
awk -F: ‘{print NF}‘ /etc/passwd #打印每一行的列数

[[email protected] ~]# cat passwd
sshd:x:74:74:Privilege-separated SSH:/var/empty/sshd:/sbin/nologin
postfix:x:89:89::/var/spool/postfix:/sbin/nologin
tcpdump:x:72:72::/:/sbin/nologin
elk:x:1000:1000::/home/elk:/bin/bash
ntp:x:38:38::/etc/ntp:/sbin/nologin
saslauth:x:998:76:Saslauthd user:/run/saslauthd:/sbin/nologin
apache:x:48:48:Apache:/usr/share/httpd:/sbin/nologin
nscd:x:28:28:NSCD Daemon:/:/sbin/nologin
[[email protected] ~]# awk -F: ‘{print NR}‘ passwd
1
2
3
4
5
6
7
8
[[email protected] ~]# awk -F: ‘{print NF}‘ passwd
7
7
7
7
7
7
7
7
[[email protected] ~]# awk -F: ‘END{print NF}‘ passwd
7
[[email protected] ~]# awk -F: ‘END{print NR}‘ passwd
8
#只使用了end语句，每读入一行，awk会将NR更新为对应的行号，当达到最后一行时NR就是最后一行的行号，于是就是文件的行数

awk ‘BEGIN{ print "start" } pattern { commands } END{ print "end" }‘
可用单引号和双引号将awk之后引起来
awk ‘BEGIN{ statements } { statements } END{ statements }‘
awk脚本通常有3部分组成，BEGIN，END，和带模式匹配选项的常见语句块。这3部分都是可选项

[[email protected] ~]# awk ‘BEGIN{ i=0 } { i++ } END{ print i }‘ passwd
8

awk拼接：

[[email protected] deployment]# docker images|grep veh
192.168.1.74:5000/veh/zuul                           0.0.1-SNAPSHOT.34        41e9c323b825        26 hours ago        172MB
192.168.1.74:5000/veh/vehicleanalysis                0.0.1-SNAPSHOT.38        bca9981ac781        26 hours ago        210MB
192.168.1.74:5000/veh/masterveh                      0.0.1-SNAPSHOT.88        265e448020f3        26 hours ago        209MB
192.168.1.74:5000/veh/obugateway                     0.0.1-SNAPSHOT.18        a4b3309beccd        8 days ago          182MB
192.168.1.74:5000/veh/frontend                       1.0.33                   357b20afec08        11 days ago         131MB
192.168.1.74:5000/veh/rtkconsumer                    0.0.1-SNAPSHOT.12        4c2e63b5b2f6        2 weeks ago         200MB
192.168.1.74:5000/veh/user                           0.0.1-SNAPSHOT.14        015fc6516533        2 weeks ago         186MB
192.168.1.74:5000/veh/rtkgw                          0.0.1-SNAPSHOT.12        a17a3eed4d28        2 months ago        173MB
192.168.1.74:5000/veh/websocket                      0.0.1-SNAPSHOT.7         a1af778846e6        2 months ago        179MB
192.168.1.74:5000/veh/vehconsumer                    0.0.1-SNAPSHOT.20        4a763860a5c5        2 months ago        200MB
192.168.1.74:5000/veh/dfconsumer                     0.0.1-SNAPSHOT.41        2e3471d6ca27        2 months ago        200MB
192.168.1.74:5000/veh/auth                           0.0.1-SNAPSHOT.4         be5c86dd285b        3 months ago        185MB
[[email protected] deployment]# docker images |grep veh |awk ‘{a=$1;b=$2;c=(a":"b);print c}‘
192.168.1.74:5000/veh/zuul:0.0.1-SNAPSHOT.34
192.168.1.74:5000/veh/vehicleanalysis:0.0.1-SNAPSHOT.38
192.168.1.74:5000/veh/masterveh:0.0.1-SNAPSHOT.88
192.168.1.74:5000/veh/obugateway:0.0.1-SNAPSHOT.18
192.168.1.74:5000/veh/frontend:1.0.33
192.168.1.74:5000/veh/rtkconsumer:0.0.1-SNAPSHOT.12
192.168.1.74:5000/veh/user:0.0.1-SNAPSHOT.14
192.168.1.74:5000/veh/rtkgw:0.0.1-SNAPSHOT.12
192.168.1.74:5000/veh/websocket:0.0.1-SNAPSHOT.7
192.168.1.74:5000/veh/vehconsumer:0.0.1-SNAPSHOT.20
192.168.1.74:5000/veh/dfconsumer:0.0.1-SNAPSHOT.41
192.168.1.74:5000/veh/auth:0.0.1-SNAPSHOT.4

awk工作方式如下：

1.执行BEGIN { commands } 语句块中内容
2.执行中间块 pattern { commands }。重复这个过程指导文件全部读取完毕
3.当读至输入流末尾时，执行END{ commands }语句块

我们可以将每一行中的第一个字段的值进行累加，即列求和

[[email protected] ~]# cat sum.data
1 2 3 4 5 6
2 2 2 2 2 2
3 3 3 3 3 3
5 5 5 6 6 6
[[email protected] ~]# cat sum.data |awk ‘BEGIN{ sum=0 }  { print $1; sum+=$1 } END { print sum }‘
1
2
3
5
11

[[email protected] ~]# awk ‘{if($2==3)print $0}‘ sum.data
3 3 3 3 3 3
[[email protected] ~]# awk ‘{if($2==5)print $0}‘ sum.data
5 5 5 6 6 6

#每个值加1
[[email protected] ~]# cat passwd
1:2:3:4
5:5:5:5
3:2:3:5
[[email protected] ~]# cat passwd |awk -F: ‘{for(i=1;i<=NF;i++){$i+=1}}{print $0}‘
2 3 4 5
6 6 6 6
4 3 4 6

[[email protected] ~]# cat passwd |awk -F: ‘{$2=$2+1;print $0}‘
1 3 3 4
5 6 5 5
3 3 3 5
[[email protected] ~]# cat passwd |awk -F: ‘{if($2==2) $2=$2+1;print $0}‘
1 3 3 4
5:5:5:5
3 3 3 5

#将所有2替换成jack fuck，需要更规范的话表达式也要用圆括号括起来
[[email protected] ~]#  cat passwd |awk -F: ‘{if($2==2) $2="jack fuck";print $0}‘
1 jack fuck 3 4
5:5:5:5
3 jack fuck 3 5
[[email protected] ~]#  cat passwd |awk -F: ‘{if($2==2) ($2="jack fuck");print $0}‘
1 jack fuck 3 4
5:5:5:5
3 jack fuck 3 5

将外部变量传递给awk

#我们借助选项-v可以将外部值传递给awk
[[email protected] ~]# VAR1=10000
[[email protected] ~]# echo |awk -v VAR=$VAR1 ‘{print VAR}‘
10000
#输入来自标准输出，所以有echo

#还有另外一种灵活的方法可以将多个外部变量传递给awk
[[email protected] ~]# VAR1=10000
[[email protected] ~]# VAR2=20000
[[email protected] ~]# echo |awk ‘{ print v1,v2 }‘ v1=$VAR1 v2=$VAR2
10000 20000

使用过滤模式对awk处理的行进行过滤

[[email protected] ~]# cat sum.data
1 2 3 4 5 6
2 2 2 2 2 2
3 3 3 3 3 3
5 5 5 6 6 6

#行号小于3的行
[[email protected] ~]# awk ‘NR<3‘ sum.data
1 2 3 4 5 6
2 2 2 2 2 2

#行号为1到4之间的行
[[email protected] ~]# awk ‘NR==1,NR==3‘ sum.data
1 2 3 4 5 6
2 2 2 2 2 2
3 3 3 3 3 3

#包含样式linux的行
awk ‘/linux/‘

#不包含样式linux的行
awk ‘!/linux/‘

按列合并多个文件

paste

wget

下载多个文件 wget URL1 URL2 URL3
wget用-t指定次数，可以不停重试 wget -t 0 URL
可以限速： wget --limit-rate 20k http://www.baidu.com
下载多个文件时可以配额，配额一旦用尽就会停止下载 wget --quota 100M URL1 URL2 URL3
断点续传 wget -c URL
访问需要认证的http或者ftp页面 wget --user username --password pass URL,也可以不在命令行中指定密码，而由网页提示并手动输入密码，这就需要将--password改成--ask-password

复制整个网站（爬虫）

wget有一个选项可以使其像爬虫一样以递归的方式遍历网页上所有的URL链接，并逐个下载。这样一来我们可以得到这个网站的所有页面

wget --mirror --convert-links www.chinanews.com
[[email protected] tmp]# ls
www.chinanews.com
[[email protected] tmp]# cd www.chinanews.com/
[[email protected] www.chinanews.com]# ls
allspecial  auto   cj             common   gangao  hb  huaren      js          m     piaowu  robots.txt   sh      society  taiwan  tp
app         china  cns2012.shtml  fileftp  gn      hr  index.html  live.shtml  part  pv      scroll-news  shipin  stock    theory
[[email protected] www.chinanews.com]# ll
 260
drwxr-xr-x 2 root root     25 10?? 12 14:11 allspecial
drwxr-xr-x 3 root root     23 10?? 12 14:11 app
drwxr-xr-x 3 root root     18 10?? 12 14:11 auto
drwxr-xr-x 2 root root     24 10?? 12 14:11 china
drwxr-xr-x 3 root root     18 10?? 12 14:11 cj
-rw-r--r-- 1 root root  15799 10?? 12 14:11 cns2012.shtml
drwxr-xr-x 3 root root     46 10?? 12 14:11 common
drwxr-xr-x 6 root root     54 10?? 12 14:11 fileftp
drwxr-xr-x 2 root root     24 10?? 12 14:11 gangao
drwxr-xr-x 4 root root     27 10?? 12 14:11 gn
drwxr-xr-x 2 root root     24 10?? 12 14:11 hb
drwxr-xr-x 3 root root     18 10?? 12 14:11 hr
drwxr-xr-x 2 root root     24 10?? 12 14:11 huaren
-rw-r--r-- 1 root root 184362 10?? 12 14:11 index.html
drwxr-xr-x 2 root root     26 10?? 12 14:11 js

#-convert-links指示wget将页面的链接地址转换为本地地址

以纯文本形式下载网页

网页下下来默认是html格式需要浏览器去查看，lynx是一个颇有玩头的基于命令行的浏览器，可以利用他获取纯文本形式的网页

#用lynx 命令-dump选项将网页的内容以ascii编码的形式存储到文本文件中
[[email protected] tmp]# yum -y install lynx
[[email protected] tmp]# lynx www.chinanews.com -dump > abc.txt
[[email protected] tmp]# cat abc.txt
 ...
 1.   http://www.chinanews.com/kong/2019/10-12/8976714.shtml
 2.   http://www.chinanews.com/kong/2019/10-12/8976812.shtml
 3.   http://www.chinanews.com/kong/2019/10-12/8976721.shtml
 4.   http://www.chinanews.com/kong/2019/10-12/8976690.shtml
 5.   http://www.chinanews.com/kong/2019/10-12/8976817.shtml
 6.   http://www.chinanews.com/kong/2019/10-12/8976794.shtml
 7.   http://www.chinanews.com/kong/2019/10-12/8976853.shtml
 8.   http://www.chinanews.com/kong/2019/10-12/8976803.shtml
 9.   http://www.chinanews.com/sh/2019/10-12/8976754.shtml
 10.  http://www.chinanews.com/tp/chart/index.shtml
 11.  http://www.chinanews.com/tp/hd2011/2019/10-12/907641.shtml
 12.  http://www.chinanews.com/tp/hd2011/2019/10-12/907637.shtml
 13.  http://www.chinanews.com/tp/hd2011/2019/10-12/907651.shtml
 14.  http://www.chinanews.com/tp/hd2011/2019/10-12/907644.shtml
 15.  http://www.chinanews.com/tp/hd2011/2019/10-12/907675.shtml
 16.  http://www.chinanews.com/tp/hd2011/2019/10-12/907683.shtml
 17.  http://www.chinanews.com/tp/hd2011/2019/10-12/907656.shtml
 18.  http://www.ecns.cn/video/2019-10-12/detail-ifzpuyxh5816910.shtml
 19.  http://www.ecns.cn/video/2019-10-11/detail-ifzpuyxh5815962.shtml
 20.  http://www.ecns.cn/video/2019-10-11/detail-ifzpuyxh5815122.shtml
 21.  http://www.ecns.cn/video/2019-10-11/detail-ifzpuyxh5815100.shtml

curl

设置cookie

要指定cookie，使用--cookie "COOKIES"选项

cookies需要以name=value的形式来给出。多个cookie之间使用分号分隔。例如：--cookie "user=slynux;pass=hack"

如果要将cookie另存为一个文件，使用--cookie-jar选项。例如 --cookie-jar cookie_file

设置用户代理字符串

如果不指定用户代理（user agent），一些需要检验用户代理的网页就无法显示。你肯定碰到过一些成就的网站只能ie下工作。如果使用其他浏览器，这些网站就会提示说她只能IE访问。这是因为这些网站检查了用户代理。你可以用curl来设置用户代理

--user-agent或-A选项用于设置用户代理： curl URL --user-agent "Mozilla/5.0"
-H头部信息传递多个头部信息： curl -H "Host: www.baidu.com" -H "Accept-language: en" URL

只打印文件头

-I或者--head

[[email protected] tmp]# curl -I www.chinanews.com
HTTP/1.1 200 OK
Date: Sat, 12 Oct 2019 08:47:31 GMT
Content-Type: text/html
Connection: keep-alive
Expires: Sat, 12 Oct 2019 08:48:22 GMT
Server: nginx/1.12.2
Cache-Control: max-age=120
Age: 69
X-Via: 1.1 PSbjwjBGP2ih137:5 (Cdn Cache Server V2.0), 1.1 shx92:3 (Cdn Cache Server V2.0), 1.1 PSjsczsxrq176:3 (Cdn Cache Server V2.0), 1.1 iyidong70:11 (Cdn Cache Server V2.0)

解析网站数据

lynx是一个基于命令行的网页浏览器。它并不会输出一堆原始的html代码，二是能够显示网站的文本版本，这个文本版和我们在浏览器中看到的页面一模一样。这样一来，就免去了移除html标签的工作。这里用到lynx的-nolist选项，这是因为不需要给每个链接自动加上数字标号。

[[email protected] tmp]# lynx  www.chinanews.com -dump -nolist
 ...
 友情链接
   外交部|国侨办|中纪委监察部|国台办|中国法院网|人民网|新华网|中国网|央视网|国际在线|中国青年网|中国经济网|中国台湾网|央广网|
   中国西藏网|中青在线|光明网|中国军网|法制网|中华网|新京报|京报网|京华网|四川广播电视台|千龙网|华龙网|红 网|舜 网|胶东在线|
   东北新闻网|东北网|齐鲁热线|四川新闻网|长城网|南方网|北方网|东方网|新浪|搜狐|网易|腾讯|华夏经纬|东方财富网|金融界|慧科|房天下

   关于我们| About us| 联系我们| 广告服务| 供稿服务| 法律声明| 招聘信息| 网站地图
   | 留言反馈

   本网站所刊载信息，不代表中新社和中新网观点。 刊用本网站稿件，务经书面授权。

   未经授权禁止转载、摘编、复制及建立镜像，违者将依法追究法律责任。

   [网上传播视听节目许可证（0106168)] [京ICP证040655号] [ [ghs.png] 京公网安备
   11000002003042号] [京ICP备05004340号-1] 总机：86-10-87826688
   违法和不良信息举报电话：15699788000 举报邮箱：[email protected] 举报受理和处置管理办法

   Copyright ?1999- 2019 chinanews.com. All Rights Reserved

                             [_1077593327_3.gif]

                  [U194P4T47D45262F978DT20190920162854.jpg]

                             [_1077593327_3.gif]

                  [U194P4T47D45262F979DT20190920162854.jpg]

case

case $变量名 in
"值 1")
;;
如果变量的值等于值1，则执行程序1，值
2")
如果变量的值等于值2，则执行程序2
…省略其他分支…
*)
如果变量的值都不是以上的值，则执行此程序
;;
esac

#!/bin/bash
#判断用户输入
read -p "Please choose yes/no: " -t 30 cho
#在屏幕上输出"请选择yes/no"，然后把用户选择赋予变量cho
case $cho in
#判断变量cho的值
    "yes")
    #如果是yes
        echo "Your choose is yes!"
        #则执行程序1
        ;;
    "no")
    #如果是no
        echo "Your choose is no!"
        #则执行程序2
        ;;
    *)
    #如果既不是yes,也不是no
    echo "Your choose is error!"
    #则执行此程序
    ;;
esac

查找网站中的无效链接

一个人采用人工方式来检查网站上的每一个页面，以便找出无效链接。要识别链接并从中找出无效链接


[[email protected] tmp]# cat find_broken.sh
#!/bin/bash
if [ $# -ne 1 ];
then
  echo -e "$Usage: $0 URL\n"
  exit 1;
fi

echo Broken links:

# $$为脚本运行时的pid
mkdir /tmp/$$.lynx
cd /tmp/$$.lynx

lynx -traversal $1 > /dev/null
count=0;

sort -u reject.data > links.txt

while read link;
do
  output=`curl -I $link -s | grep "HTTP/.*OK"`
  if [[ -z $output ]];
    then $link;
    let count++
  fi
done < links.txt

[ $count -eq 0 ] && echo No broken links found.

#lynx -traversal URL会在工作目录下生成数个文件，其中包括reject.dat,该文件包含网站中的所有链接。sort -u用来建立一个不包含重复项的列表。我们curl检验头部即可

#sort -u去重，类似于uniq

lynx -traversal从名称上来看，jeject.dat中应该包含的无效URL的列表，实际并非如此，而是将所有的URL全都放在了这个文件中
lynx还生成了一个traverse.error的文件，其中包含了所有在浏览过程中存在问题的URL。但是lynx只会将返回HTTP404的URL，会遗漏那些存其他类型错误的URL，这是为何要手动检查返回状态的原因

原文地址：https://blog.51cto.com/4169523/2465435

时间： 2024-11-09 01:01:03

linux脚本攻略

cat

find

玩转xargs

print和print0的区别

tr

排序，唯一与重复

临时文件命名和随机数

分割文件和数据split

文件分割csplit

根据扩展名切分文件名

批量重命名和移动

交互输入自动化

利用并行进程加速命令执行

文本文件的交集和差集

创建不可修改的文件

grep

cut (略)

sed

awk

按列合并多个文件

wget

以纯文本形式下载网页

curl

解析网站数据

case

查找网站中的无效链接

linux脚本攻略的相关文章

shell脚本学习1（Linux脚本攻略）

Linux Shell脚本攻略(1.10)

Linux Shell脚本攻略(1.8)

Linux Shell脚本攻略（1.2）

LINUX SHELL脚本攻略笔记[速查]

Linux Shell脚本攻略(1.7)

Linux Shell脚本攻略(1.3)

老李分享：《Linux Shell脚本攻略》要点（八）

老李分享：《Linux Shell脚本攻略》要点（二）