python--正则匹配首尾标签中间的内容

import re

it = re.finditer(r"<url>.*?</url>", ‘被筛选字符串‘)   # 匹配url标签里的内容
# it = re.finditer(r"<command>.*?</command>", con)   # 匹配标签里的内容，有换行符\n导致匹配失败
it2 = re.finditer(r"<command>[\s\S]*?</command>", ‘被筛选字符串‘)   # 成功匹配方法1
# it = re.finditer(r"<command>[\d\D]*?</command>", con)  # 成功匹配方法2

for match in it:
   ret = match.group()
   print(ret)
for match in it2:
   ret = match.group()
   print(ret)

原文地址：https://www.cnblogs.com/lutt/p/12207855.html

时间： 2024-11-13 02:36:37

python--正则匹配首尾标签中间的内容的相关文章

Python – 正则匹配

用正则切分字符串输出 ['info','xiaoZhang','33','shandong'].s="info:xiaoZhang33shandong", import re s="info:xiaoZhang 33 shandong" res = re.split(r":| ", s) # |表示或,根据冒号或者空格切分 print(res) ['info', 'xiaoZhang', '33', 'shandong'] 正则匹配以163.co

Python正则匹配字母大小写不敏感在读xml中的应用

需要解决的问题:要匹配字符串,字符串中字母的大小写不确定,如何匹配? 问题出现之前是使用字符串比较的方式,比如要匹配'abc',则用语句: 1 if s == 'abc':#s为需要匹配的字符串 2 print '匹配成功\n' 现在的问题是s可能是Abc.ABC等等,所以需要大小写不敏感的匹配,如果把需要匹配的模式的大小写进行罗列,那即使是三个字母的短模式也是很麻烦,查了一下,正则表达式re模块中有个参数flags=re.I,这样就可以大小写不敏感的匹配了,示例如下: 1 import re

利用Python正则匹配中文——爬取校园网公告栏中感兴趣的内容

写这个程序是因为校园网公告栏时不时会有学术报告,讲座之类的信息发布,但这类信息往往发布在讲座的前一天,以至于丢失很多重要消息.同时公告栏里也会发布一些跟学生无关的内容,比如工会主席会议啥的. 主要遇到的困难时对中文的正则匹配问题.(比如通过第一次正则可以提取到一个页面内的所有中文标题,第二次正则从这些中文标题中将能匹配上“报告”两个字的对象添加到结果list内) 学校公告页面是gb2312编码.我使用的方式是,整个工程使用utf-8编码,将需要匹配的关键字转换成utf-8编码格式,使用正则匹配u

正则匹配<img>标签

最近需要从网页中通过正则获取img的链接,查了很多资料,最终把采用的贴出来: (?is)<img\s*((?<key>[^=]+)="*(?<value>[^"]+)")+?\s*/?> int i = 0; // previewDom = Pattern.compile("<img/s+[^>]*/s*src/s*=/s*([']?)(?<url>/S+)'?[^>]*>").mat

python 正则匹配的re.search 例子

一个简单的使用re.search 匹配一个字符串中的关键字 [[email protected] ~]# vim ceshi.py #!/usr/bin/env python #coding:utf-8 import re err=("stderr: 'Permission denied (publickey) fatal: Could not read from remote repository") matchpro = re.search( r'Permission denied

Python正则匹配之有名分组

参考:http://www.cnblogs.com/huxi/archive/2010/07/04/1771073.html # re.match import re m = re.match(r'(\w+) (?P<sign>.*)', 'hello standby!') print(type(m)) # <class '_sre.SRE_Match'> print(m) # <_sre.SRE_Match object; span=(0, 14), match='hell

Python正则匹配递归获得给出目录下的特定类型的文件小技巧

需求是酱的: 输入一个目录,这个目录包含检测目录的必备信息但不准确需要获得后加工一下,如给出目录:C:\Program Files\Common Files\DESIGNER,需要检测的目录是:C:\Program Files\Common Files\System,即从给出的目录中获取前面的信息,后面的补上的目录(System)是指定的.从E:\res\tmp目录中检测xml文件,返回xml文件的目录代码如下: 1 import os 2 import re 3 pathlist = []

python正则匹配示例

text="山东省临沂市兰山区市委大院中区21号楼4单元 276002 奥特曼1号 18254998111" #匹配手机号 m=re.findall(r"1\d{10}",text) if m: print(m) #匹配电话号 pattern = re.compile(r"((\d{3}|\(\d{3}\)|\d{4}|\(\d{4}\))?(\s|-|.)?(\d{8}))") a = re.match(pattern, text) if a

python正则匹配

两种方法 : 1 str='价格￥ 198.00' mode = re.compile(r'(([0-9]+)(.[0-9]{1,2})?)')print mode.findall(str)[0][0] 2 tmp=re.search(r'(([0-9]+)(.[0-9]{1,2})?)',str).group()结果都是198.00