regular expression, grep (python, linux)

https://docs.python.org/2/library/re.html

re.match(pattern, string, flags=0)  尝试从字符串的起始位置匹配一个模式

re.search(pattern, string, flags=0)  扫描整个字符串并返回第一个成功的匹配

re.sub(pattern, repl, string, max=0)  替换字符串中的匹配项

>>> import re

>>> s=‘112.90.239.137 112.90.239.137 1526446118 [26/Nov/2015:00:00:47 +0800] 23 "GET /ag/coord/convert?_appName=jiakaobaodianxingui&_appUser=632e76c53b4f3c9ffe90b8c4c61bd5b0&_cityCode=330300&_cityName=%E6%B8%A9%E5%B7%9E&_device=iPhone&_firstTime=2015-10-28%2018%3A49%3A05&_gpsType=baidu&_idfa=D0DD23E5-B407-449B-B005-0B52C6C2CBF3&_idfv=85A23658-2DD4-490D-B8AE-767842401821&_imei=c09b0b9b9759e72eaf0fd6e3eb38e55113d74cdd&_j=1.0&_jail=false&_latitude=27.610026844605&_launch=45&_longitude=120.56419068644&_network=wifi&_openUuid=c09b0b9b9759e72eaf0fd6e3eb38e55113d74cdd&_pkgName=cn.mucang.ios.jiakaobaodianPromise&_platform=iphone&_product=%E9%A9%BE%E8%80%83%E5%AE%9D%E5%85%B8-%E9%A9%BE%E7%85%A7%E8%80%83%E8%AF%95&_productCategory=jiakaobaodian&_renyuan=mucang&_screenDip=2&_screenHeight=1136&_screenWidth=640&_system=iPhone%20OS&_systemVersion=9.0.2&_vendor=appstore&_version=5.9.0&from=0&to=4&x=120.5576965508963&y=27.61254659188421 HTTP/1.1" "api.map.baidu.com" 200 76 gzip:116pct. "-" "BAIDUID=C328D2934E2C6EDF8E185FAC44EB168D:FG=1" "jiakaobaodianPromise/5.9.0 (iPhone; iOS 9.0.2; Scale/2.00)" map apimap 16555290153476373216 10.46.234.22 "9904758605881922946"‘

>>> res=re.compile(r"(.*) (.*) (.*) \[(.*)\] (.*) \"(.*)\" \"(.*)\" (.*) (.*) (.*) \"(.*)\" \"(.*)\" \"(.*)\" (.*) (.*) (.*) (.*) \"(.*)\"")

>>> res is None

False

>>> res.search(s).groups()

(‘112.90.239.137‘, ‘112.90.239.137‘, ‘1526446118‘, ‘26/Nov/2015:00:00:47 +0800‘, ‘23‘, ‘GET /ag/coord/convert?_appName=jiakaobaodianxingui&_appUser=632e76c53b4f3c9ffe90b8c4c61bd5b0&_cityCode=330300&_cityName=%E6%B8%A9%E5%B7%9E&_device=iPhone&_firstTime=2015-10-28%2018%3A49%3A05&_gpsType=baidu&_idfa=D0DD23E5-B407-449B-B005-0B52C6C2CBF3&_idfv=85A23658-2DD4-490D-B8AE-767842401821&_imei=c09b0b9b9759e72eaf0fd6e3eb38e55113d74cdd&_j=1.0&_jail=false&_latitude=27.610026844605&_launch=45&_longitude=120.56419068644&_network=wifi&_openUuid=c09b0b9b9759e72eaf0fd6e3eb38e55113d74cdd&_pkgName=cn.mucang.ios.jiakaobaodianPromise&_platform=iphone&_product=%E9%A9%BE%E8%80%83%E5%AE%9D%E5%85%B8-%E9%A9%BE%E7%85%A7%E8%80%83%E8%AF%95&_productCategory=jiakaobaodian&_renyuan=mucang&_screenDip=2&_screenHeight=1136&_screenWidth=640&_system=iPhone%20OS&_systemVersion=9.0.2&_vendor=appstore&_version=5.9.0&from=0&to=4&x=120.5576965508963&y=27.61254659188421 HTTP/1.1‘, ‘api.map.baidu.com‘, ‘200‘, ‘76‘, ‘gzip:116pct.‘, ‘-‘, ‘BAIDUID=C328D2934E2C6EDF8E185FAC44EB168D:FG=1‘, ‘jiakaobaodianPromise/5.9.0 (iPhone; iOS 9.0.2; Scale/2.00)‘, ‘map‘, ‘apimap‘, ‘16555290153476373216‘, ‘10.46.234.22‘, ‘9904758605881922946’)

>>> re.sub(‘(<b>)|(</b>)‘, ‘‘, s)

grep:

http://ryanstutorials.net/linuxtutorial/grep.php

-v, --invert-match        select non-matching lines

-i, --ignore-case         ignore case distinctions

-f, --file=FILE           obtain PATTERN from FILE

-w, --word-regexp         force PATTERN to match only whole words

-o, --only-matching       show only the part of a line matching PATTERN

-P, --perl-regexp         PATTERN is a Perl regular expression

-n, --line-number         print line number with output lines

-H, --with-filename       print the file name for each match

-B, --before-context=NUM  print NUM lines of leading context

-A, --after-context=NUM   print NUM lines of trailing context

-C, --context=NUM         print NUM lines of output context

-a, --text                equivalent to --binary-files=text

-s, --no-messages         suppress error messages

regexp:

http://ryanstutorials.net/regular-expressions-tutorial/

https://www.debuggex.com/

  • . (dot) - a single character.
  • ? - the preceding character matches 0 or 1 times only.
  • * - the preceding character matches 0 or more times.
  • + - the preceding character matches 1 or more times.
  • {n} - the preceding character matches exactly n times.
  • {n,m} - the preceding character matches at least n times and not more than m times.
  • [agd] - the character is one of those included within the square brackets.
  • [^agd] - the character is not one of those included within the square brackets.
  • [c-f] - the dash within the square brackets operates as a range. In this case it means either the letters c, d, e or f.
  • () - allows us to group several characters to behave as one.
  • | (pipe symbol) - the logical OR operation.
  • ^ - matches the beginning of the line.
  • $ - matches the end of the line.
  • \s - matches anything which is considered whitespace. This could be a space, tab, line break etc.
  • \S - matches the opposite of \s, that is anything which is not considered whitespace.
  • \d - matches anything which is considered a digit. ie 0 - 9 (It is effectively a shortcut for [0-9]).
  • \D - matches the opposite of \d, that is anything which is not considered a digit.
  • \w - matches anything which is considered a word character. That is [A-Za-z0-9_]. Note the inclusion of the underscore character ‘_‘. This is because in programming and other areas we regulaly use the underscore as part of, say, a variable or function name.
  • \W - matches the opposite of \w, that is anything which is not considered a word character.
  • Tab - represented in regular expressions as \t
  • Carriage return - represented in regular expressions as \r
  • Line feed (or newline) - represented in regular expressions as \n
  • Windows - uses the sequence \r\n (in that order)
  • Mac OS (version 9 and below) - uses the sequence \r
  • Unix/Linux and OSX - uses the sequence \n
  • \< - represents the beginning of a word.
  • \> - represents the end of a word.
  • \b - represents either the beginning or end of a word.
  • ( )Group part of the regular expression.\1 \2 etcRefer to something matched by a previous grouping.|Match what is on either the left or right of the pipe symbol.(?=x)Positive lookahead.(?!x)Negative lookahead.(?<=x)Positive lookbehind.(?<!x)Negative lookbehind.

原文地址:https://www.cnblogs.com/yaoyaohust/p/10363200.html

时间: 2024-10-12 04:46:12

regular expression, grep (python, linux)的相关文章

Python正则表达式 re(regular expression)

1. 点. .: 代表一个字符 (这个跟linux的正则表达式是不同的,那里.代表的是后面字符的一次或0次出现) 2. 转义 \\ 或者 r'\': 如 r'python\.org' (对.符号的转义) 3. ^ 非或叫做排除 如[^abc]: 任何以非a,b,c的字符 4. | 选择符 如python|perl (从python和perl选择一个) 也可以: p(ython|erl) 5. ? 可选项 如: r'(http://)?(www\.)?python\.org' (http://和w

Python正则表达式Regular Expression基本用法

资料来源:http://blog.csdn.net/whycadi/article/details/2011046   直接从网上资料转载过来,作为自己的参考.这个写的很清楚.先拿来看看. 1.正则表达式re模块的基本函数. (1)findall函数的用法 findall(rule,target[,flag])是在目标字符串中找到符合规则的字符串.参数说明:rule表示规则,target表示目标字符串,[,flag]表示的是规则选项.返回的结果是一个列表.若没找到符合的,是一个空列表. 如: 因

LeetCode 10 Regular Expression Matching (C,C++,Java,Python)

Problem: Implement regular expression matching with support for '.' and '*'. '.' Matches any single character. '*' Matches zero or more of the preceding element. The matching should cover the entire input string (not partial). The function prototype

linux 下Regular Expression

正则表达式:Regular Expression,REGEXP元字符:.:表示任意单个字符[]: 匹配指定范围内的任意单个字符[^]: 匹配指定范围外的任单个字符 字符集合: [:digit:], [:lower:],[:upper:],[:punct:] [:alpha:] [:space:],[:alnum:] 字符个数:(贪婪模式)*:匹配其前面的字符任意次 a, b, ab, aab, acb, adb, amnb a*b b ab aab a.*b .*:任意长度的任意字符 \?: 匹

[LeetCode][Python]Regular Expression Matching

# -*- coding: utf8 -*-'''https://oj.leetcode.com/problems/regular-expression-matching/ Implement regular expression matching with support for '.' and '*'. '.' Matches any single character.'*' Matches zero or more of the preceding element. The matchin

[LEETCODE][PYTHON][DP]REGULAR EXPRESSION MATCHING

# -*- coding: utf8 -*-'''https://oj.leetcode.com/problems/regular-expression-matching/ Implement regular expression matching with support for '.' and '*'. '.' Matches any single character.'*' Matches zero or more of the preceding element. The matchin

Regular Expression(正则表达式)之邮箱验证

正则表达式(regular expression, 常常缩写为RegExp) 是一种用特殊符号编写的模式,描述一个或多个文本字符串.使用正则表达式匹配文本的模式,这样脚本就可以轻松的识别和操作文本.其实,正则表达式是值得大家花时间学习的.正则表达式不仅在javaScript 中有用,在其他许多地方也可以使用正则表达式,例如其他编程语言(比如Perl,Java,C#,Python 和PHP ),Apache 配置文件以及BBEdit 和TextMate 等文本编辑器.甚至Adobe Dreamwe

Regular Expression 基础

Regular Expression:正则表达式 又称 正规表达式 计算机科学的一个概念.正则表达式使用单个字符串来描述.匹配一系列符合某个句法规则的字符串.在很多文本编辑器里,正则表达式通常被用来检索.替换那些符合某个模式的文本. 学习前提示换行符‘\n’和回车符‘\r’ 顾名思义,换行符就是另起一行,回车符就是回到一行的开头,所以我们平时编写文件的回车符应该确切来说叫做回车换行符 '\n' 10 换行(newline)'\r' 13 回车(return) 也可以表示为'\x0a'和'\x0d

LeetCode 10. Regular Expression Matching

https://leetcode.com/problems/regular-expression-matching/description/ Implement regular expression matching with support for '.' and '*'. '.' Matches any single character. '*' Matches zero or more of the preceding element. The matching should cover