9 Apr 18 shelve模块 xml模块 re模块

9 Apr 18

上节课复习：

一、shelve模块

Shelve（了解），是更高程度的封装。使用时只针对之前设计生成的文件，可以无视不同平台自动生成的其他文件。

Json的中间格式为字符串，用w写入文件

Pickle的中间格式为bytes，用b写入文件

序列化时更常用Json

import shelve

info1={‘age‘:18,‘height‘:180,‘weight‘:80}

info2={‘age‘:73,‘height‘:150,‘weight‘:80}

d=shelve.open(‘db.shv‘)

d[‘egon‘]=info1

d[‘alex‘]=info2

d.close()

d=shelve.open(‘db.shv‘)

print(d[‘egon‘])

print(d[‘alex‘])

d.close()

d=shelve.open(‘db.shv‘,writeback=True)

d[‘alex‘][‘age‘]=10000

print(d[‘alex‘])

d.close()

d=shelve.open(‘db.shv‘,writeback=True) #如果想改写，需设置writeback=True

print(d[‘alex‘])

d.close()

二、xml模块

xml时一种组织数据的形式

xml下的元素对应三个特质，tag， attrib， text

#==========================================>查

import xml.etree.ElementTree as ET

tree=ET.parse(‘a.xml‘)

root=tree.getroot()

三种查找节点的方式

res=root.iter(‘rank‘) # 会在整个树中进行查找，而且是查找到所有

for item in res:

    print(‘=‘*50)

    print(item.tag) # 标签名

    print(item.attrib) #属性

    print(item.text) #文本内容

res=root.find(‘country‘) # 只能在当前元素的下一级开始查找。并且只找到一个就结束

print(res.tag)

print(res.attrib)

print(res.text)

nh=res.find(‘neighbor‘)

print(nh.attrib)

cy=root.findall(‘country‘) # 只能在当前元素的下一级开始查找,

print([item.attrib for item in cy])

#==========================================>改

import xml.etree.ElementTree as ET

tree=ET.parse(‘a.xml‘)

root=tree.getroot()

for year in root.iter(‘year‘):

    year.text=str(int(year.text) + 10)

    year.attrib={‘updated‘:‘yes‘}   #一般不会改tag

tree.write(‘a.xml‘)

#==========================================>增

import xml.etree.ElementTree as ET

tree=ET.parse(‘a.xml‘)

root=tree.getroot()

for country in root.iter(‘country‘):

    year=country.find(‘year‘)

    if int(year.text) > 2020:

        print(country.attrib)

        ele=ET.Element(‘egon‘)

        ele.attrib={‘nb‘:‘yes‘}

        ele.text=‘非常帅‘

        country.append(ele)

        country.remove(year)

tree.write(‘b.xml‘)

三、re模块（正则）

正则---在爬虫中最为常用；使用爬虫时有其他模块可以导入帮助clear数据，正则也可用于其他方面

import re

print(re.findall(‘\w‘,‘ab 12\+- *&_‘))

print(re.findall(‘\W‘,‘ab 12\+- *&_‘))

print(re.findall(‘\s‘,‘ab \r1\n2\t\+- *&_‘))

print(re.findall(‘\S‘,‘ab \r1\n2\t\+- *&_‘))

print(re.findall(‘\d‘,‘ab \r1\n2\t\+- *&_‘))

print(re.findall(‘\D‘,‘ab \r1\n2\t\+- *&_‘))

print(re.findall(‘\w_sb‘,‘egon alex_sb123123wxx_sb,lxx_sb‘))

print(re.findall(‘\Aalex‘,‘abcalex is salexb‘))

print(re.findall(‘\Aalex‘,‘alex is salexb‘))

print(re.findall(‘^alex‘,‘alex is salexb‘))

print(re.findall(‘sb\Z‘,‘alexsb is sbalexbsb‘))

print(re.findall(‘sb$‘,‘alexsb is sbalexbsb‘))

print(re.findall(‘^ebn$‘,‘ebn1‘)) #^ebn$ 筛出的就是ebn（以ebn开头，以ebn结尾）

print(re.findall(‘a\nc‘,‘a\nc a\tc a1c‘))

\t为制表符，在不同平台表示不同的空个数

\A ó ^     #使用^

\Z ó $     #使用$

# 重复匹配：

#.   ?   *   +  {m,n}  .*  .*?

1、.:代表除了换行符外的任意一个字符

. 除了换行符之外的任意一个字符， 如果想不除换行符，后加re.DOTALL

print(re.findall(‘a.c‘,‘abc a1c aAc aaaaaca\nc‘))

print(re.findall(‘a.c‘,‘abc a1c aAc aaaaaca\nc‘,re.DOTALL))

2、？：代表左边那一个字符重复0次或1次

？不能单独使用

print(re.findall(‘ab?‘,‘a ab abb abbb abbbb abbbb‘))

3、*：代表左边那一个字符出现0次或无穷次

print(re.findall(‘ab*‘,‘a ab abb abbb abbbb abbbb a1bbbbbbb‘))

4、+ ：代表左边那一个字符出现1次或无穷次

print(re.findall(‘ab+‘,‘a ab abb abbb abbbb abbbb a1bbbbbbb‘))

5、{m,n}:代表左边那一个字符出现m次到n次

print(re.findall(‘ab?‘,‘a ab abb abbb abbbb abbbb‘))

print(re.findall(‘ab{0,1}‘,‘a ab abb abbb abbbb abbbb‘))

print(re.findall(‘ab*‘,‘a ab abb abbb abbbb abbbb a1bbbbbbb‘))

print(re.findall(‘ab{0,}‘,‘a ab abb abbb abbbb abbbb a1bbbbbbb‘))

print(re.findall(‘ab+‘,‘a ab abb abbb abbbb abbbb a1bbbbbbb‘))

print(re.findall(‘ab{1,}‘,‘a ab abb abbb abbbb abbbb a1bbbbbbb‘))

print(re.findall(‘ab{1,3}‘,‘a ab abb abbb abbbb abbbb a1bbbbbbb‘))

6、.*：匹配任意长度，任意的字符=====》贪婪匹配

print(re.findall(‘a.*c‘,‘ac a123c aaaac a *123)()c asdfasfdsadf‘))

7、.*？：非贪婪匹配

print(re.findall(‘a.*?c‘,‘a123c456c‘))

():分组

print(re.findall(‘(alex)_sb‘,‘alex_sb asdfsafdafdaalex_sb‘))

print(re.findall(

    ‘href="(.*?)"‘,

    ‘<li><a id="blog_nav_sitehome" class="menu" href="http://www.cnblogs.com/">博客园</a></li>‘)

[]:匹配一个指定范围内的字符（这一个字符来自于括号内定义的）

[] 内写什么就是其单独的意义， 可写0-9 a-zA-Z

print(re.findall(‘a[0-9][0-9]c‘,‘a1c a+c a2c a9c a11c a-c acc aAc‘))

当-需要被当中普通符号匹配时，只能放到[]的最左边或最 右边

a-b有特别的意思，所以如果想让-表示它本身，要将其放在最左或最右

print(re.findall(‘a[-+*]c‘,‘a1c a+c a2c a9c a*c a11c a-c acc aAc‘))

print(re.findall(‘a[a-zA-Z]c‘,‘a1c a+c a2c a9c a*c a11c a-c acc aAc‘))

[]内的^代表取反的意思 （^在[]中表示取反）

print(re.findall(‘a[^a-zA-Z]c‘,‘a c a1c a+c a2c a9c a*c a11c a-c acc aAc‘))

print(re.findall(‘a[^0-9]c‘,‘a c a1c a+c a2c a9c a*c a11c a-c acc aAc‘))

print(re.findall(‘([a-z]+)_sb‘,‘egon alex_sb123123wxxxxxxxxxxxxx_sb,lxx_sb‘))

| :或者

print(re.findall(‘compan(ies|y)‘,‘Too many companies have gone bankrupt, and the next one is my company‘))

(?:   ):代表取匹配成功的所有内容，而不仅仅只是括号内的内容 （（？：   ）表示匹配的结果都要，不单单要（）内的）

print(re.findall(‘compan(?:ies|y)‘,‘Too many companies have gone bankrupt, and the next one is my company‘))

print(re.findall(‘alex|sb‘,‘alex sb sadfsadfasdfegon alex sb egon‘))

re模块的其他方法：

print(re.findall(‘alex|sb‘,‘123123 alex sb sadfsadfasdfegon alex sb egon‘))

print(re.search(‘alex|sb‘,‘123213 alex sb sadfsadfasdfegon alex sb egon‘).group())

print(re.search(‘^alex‘,‘123213 alex sb sadfsadfasdfegon alex sb egon‘))

print(re.search(‘^alex‘,‘alex sb sadfsadfasdfegon alex sb egon‘).group())

re.search, 取第一个结果，若没有返回None；若想让结果直接显示后加group（）；返回None时用group（）会报错

print(re.match(‘alex‘,‘alex sb sadfsadfasdfegon alex sb egon‘).group())

print(re.match(‘alex‘,‘123213 alex sb sadfsadfasdfegon alex sb egon‘))

re.match 相当于^版本的search

info=‘a:b:c:d‘

print(info.split(‘:‘))

print(re.split(‘:‘,info))

info=r‘get :a.txt\3333/rwx‘

print(re.split(‘[ :\\\/]‘,info))

re.split与split相比，内部可以使用正则表达式

print(‘egon is beutifull egon‘.replace(‘egon‘,‘EGON‘,1))

print(re.sub(‘(.*?)(egon)(.*?)(egon)(.*?)‘,r‘\1\2\3EGON\5‘,‘123 egon is beutifull egon 123‘))

print(re.sub(‘(lqz)(.*?)(SB)‘,r‘\3\2\1‘,r‘lqz is SB‘))

print(re.sub(‘([a-zA-Z]+)([^a-zA-Z]+)([a-zA-Z]+)([^a-zA-Z]+)([a-zA-Z]+)‘,r‘\5\2\3\4\1‘,r‘lqzzzz123+ is SB‘))

re.sub 与replace相比，内部可以使用正则表达式

pattern=re.compile(‘alex‘)

print(pattern.findall(‘alex is alex alex‘))

print(pattern.findall(‘alexasdfsadfsadfasdfasdfasfd is alex alex‘))

原文地址：https://www.cnblogs.com/zhangyaqian/p/py20180409.html

时间： 2024-10-29 19:12:31

9 Apr 18 shelve模块 xml模块 re模块的相关文章

8.模块介绍 time &datetime模块 random os sys shutil json & picle shelve xml处理 yaml处理 configparser hashlib subprocess logging模块 re正则表达式

本节大纲: 模块介绍 time &datetime模块 random os sys shutil json & picle shelve xml处理 yaml处理 configparser hashlib subprocess logging模块 re正则表达式模块,用一砣代码实现了某个功能的代码集合. 类似于函数式编程和面向过程编程,函数式编程则完成一个功能,其他代码用来调用即可,提供了代码的重用性和代码间的耦合.而对于一个复杂的功能来,可能需要多个函数才能完成(函数又可以在不同的.p

常用模块：re ，shelve与xml模块

一 shelve模块: shelve模块比pickle模块简单,只有一个open函数,所以使用完之后要使用f.close关闭文件.返回类似字典的对象,可读可写;key必须为字符串,而值可以是python所支持的数据类型. import shelve f=shelve.open(r'sheve.txt') # f['stu1_info']={'name':'egon','age':18,'hobby':['piao','smoking','drinking']} # f['stu2_info']=

python 常用模块 time random os模块 sys模块 json & pickle shelve模块 xml模块 configparser hashlib subprocess logging re正则

python 常用模块 time random os模块 sys模块 json & pickle shelve模块 xml模块 configparser hashlib subprocess logging re正则转自老男孩老师Yuan:http://www.cnblogs.com/yuanchenqi/articles/5732581.html 模块&包(* * * * *) 模块(modue)的概念: 在计算机程序的开发过程中,随着程序代码越写越多,在一个文件里代码就会越来越长,

常用模块（三）——shelve、xml、hashlib、configparser

常用模块(三) 一.shelve模块 1.shelve模块也是一种序列化模块,内部使用的是pickle模块,所以也存在跨平台性差的问题 2.特点: 只要提供一个文件名即可读写的方式和字典一样将数据以类似字典的形式在文件中读写 3.应用场景在单击的程序中使用 4.使用方法 (1)序列化 1 import shelve 2 3 s1= shelve.open('a.txt') # 打开文件 4 s1['week']=["Sun", "Mon", "Tue

4 Apr 18 软件开发目录 logging模块的使用序列化(Json, Pickle) os模块

4 Apr 18 上节课复习:函数在一个程序内被使用,模块可以被几个程序共享使用一.软件开发目录 confàsettings.py core(主要逻辑)àsrc.py dbàdb.txt lib(库)àcommon.py bin(入口,启动)àstart.py logàaccess.log readme(说明书) 二.logging模块的使用日志分为五个级别:debug 10, info 20, warning 30, error 40, critical 50 若日志级别设为10,包括

学习日记0813常用模块configparser,shelve,hashlib,xml

configparser模块什么是configparser模块用于解析配置文件后缀为 ini或者cfg 怎么用configparser模块查看配置文件中的内容 1 import configparser 2 cfg = configparser.ConferParser() 3 cfg.read('文件路径',encoding='utf-8') 4 print(cfg.sections()) 5 print(cfg.options('section名')) 修改配置文件中的内容 impo

json,pickle,shelve模块,xml处理模块

常用模块学习-序列化模块详解什么叫序列化? 序列化是指把内存里的数据类型转变成字符串,以使其能存储到硬盘或通过网络传输到远程,因为硬盘或网络传输时只能接受bytes. 为什么要序列化? 你打游戏过程中,打累了,停下来,关掉游戏.想过2天再玩,2天之后,游戏又从你上次停止的地方继续运行,你上次游戏的进度肯定保存在硬盘上了,是以何种形式呢?游戏过程中产生的很多临时数据是不规律的,可能在你关掉游戏时正好有10个列表,3个嵌套字典的数据集合在内存里,需要存下来?你如何存?把列表变成文件里的多行多列形式

python_day06 常用模块xml/configparser/hashlib/subprocess 面向对象程序设计

常用模块shutilxmlconfigparserhashlibsuprocess面向对象的程序设计常用模块 xml模块 1 <?xml version="1.0"?> 2 <data> 3 <country name="Liechtenstein"> 4 <rank updated="yes">2</rank> 5 <year>2008</year> 6 &l

从零开始编写自己的C#框架（18）——Web层后端权限模块——菜单管理

从本章开始,主要讲解的是页面中对框架相关功能的调用方法,比如列表页面(又分为有层次感列表和普通列表).编辑页面.多标签页面等,只要熟悉了这些函数的使用方法,那么开发起来就会很便捷了. 1.如图先创建菜单列表与编辑页面 MenuInfoList.aspx 1 <%@ Page Language="C#" AutoEventWireup="true" CodeBehind="MenuInfoList.aspx.cs" Inherits=&quo