/*为节省时间,本文以汉文撰写*/
~前言~
深入学习正则表达式,可以很好的提高思维逻辑的缜密性;又因正则应用于几乎所有高级编程语言,其重要性不言而喻,是江湖人士必备的内功心法。
正则表达式概要(object:PCRE)
「一」匹配方向
- 横向视图,即按行:从左至右
- 纵向视图,即按列:自上而下
「二」基本匹配位移单位
默认以单个字符为基本位移单位;可通过\b“contents”\b格式指定按“连续字符串”为基本单位进行逐位匹配,\b\b的边界定义可以为blank-空格、Tab制表符、\n-Linux换行符、\r-MS回车符或标点符号
「三」匹配范围
默认贪婪特性,即匹配符合条件的最大范围;可在量词后追加一个“?”转换为懒惰模式
~正题~ Zero-Length Assertions
中文通常译作“零宽断言”,起源于Perl5,very powerful and flexible!为便于理解,可将其与^$\b等归为一类,即:不实际占用任何字符位的虚拟分界线,英文名称即包含“Zero-Length”!
按其相对匹配目标的位移方向,可分为Lookahead和Lookbehind,按其匹配逻辑取向(True/False),又分为positive和negative
即:
- Positive Lookahead Zero-Length Assertions正逻辑向前位移零宽断言——按基本位移单位逐个查找符合条件的目标,然后在目标之前标记虚拟分界线;表达式(?=exp)
- Positive Lookbehind Zero-Length Assertions正逻辑向后位移零宽断言——按基本位移单位逐个查找符合条件的目标,然后在目标之后标记虚拟分界线;表达式(?<=exp)向后位移零宽断言,其“exp”不能包含如{1,}*+等量词以及(ab)|(bcde)等形式
- Negative Lookahead Zero-Length Assertions负逻辑向前位移零宽断言——按基本位移单位逐个查找不符合条件的目标,然后在目标之前标记虚拟分界线;表达式(?!exp)
- Negative Lookbehind Zero-Length Assertions负逻辑向后位移零宽断言——按基本位移单位逐个查找不符合条件的目标,然后在目标之后标记虚拟分界线;表达式(?<!exp)向后位移零宽断言,其“exp”不能包含如{3,100}+*等量化单位或(\d)|(\s\w)等表达式
特别注意:Zero-Length Assertions中的匹配条件“exp”仅仅用于确定“虚似分界线”的位置,并不选中或排除任何字符,其意义是缩小匹配范围;最终匹配出的结果是由零宽表达式之外的条件确定的。
如下以“ip addr”的输出为示例分类讲解
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 brd 127.255.255.255 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000 link/ether 40:8d:5c:e2:87:2f brd ff:ff:ff:ff:ff:ff inet 172.18.21.244/24 brd 172.18.21.255 scope global eth0 valid_lft forever preferred_lft forever inet6 fe80::428d:5cff:fee2:872f/64 scope link valid_lft forever preferred_lft forever
实验一:提取所有端口的名称及其MTU值
[email protected] ~ $ ip addr | grep -oP ‘(\w+(?=:+\s+<+))|(?<=\smtu\s)\d+‘ lo 65536 eth0 1500
实验二「00」:排除“ip addr”输出结果中含有“lft”的行
[email protected] ~ $ ip addr | grep -oP ‘^(?!.*lft).*$‘ 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 brd 127.255.255.255 scope host lo inet6 ::1/128 scope host 2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000 link/ether 40:8d:5c:e2:87:2f brd ff:ff:ff:ff:ff:ff inet 172.18.21.244/24 brd 172.18.21.255 scope global eth0 inet6 fe80::428d:5cff:fee2:872f/64 scope link
实验二「01」:错误演示
[email protected] ~ $ ip addr | grep -oP ‘(?!.*lft).*‘ 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 brd 127.255.255.255 scope host lo ft forever inet6 ::1/128 scope host ft forever 2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000 link/ether 40:8d:5c:e2:87:2f brd ff:ff:ff:ff:ff:ff inet 172.18.21.244/24 brd 172.18.21.255 scope global eth0 ft forever inet6 fe80::428d:5cff:fee2:872f/64 scope link ft forever [email protected] ~ $ ip addr | grep -oP ‘\b(?!.*lft).*\b‘ 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 brd 127.255.255.255 scope host lo forever inet6 ::1/128 scope host forever 2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000 link/ether 40:8d:5c:e2:87:2f brd ff:ff:ff:ff:ff:ff inet 172.18.21.244/24 brd 172.18.21.255 scope global eth0 forever inet6 fe80::428d:5cff:fee2:872f/64 scope link forever
错误解析:必须“^$”限定基本位移单位为整行,方能达成任意一次匹配结果为false时,即判定排除整行的目的。
REFERENCE:
http://www.regular-expressions.info/lookaround.html
Hadex's brief analysis of "Lookahead and Lookbehind Zero-Length Assertions"
时间: 2024-10-27 12:37:33