全局变量
全局变量 python在一个.py文件内部自动添加了一些全局变量
print(vars()) #查看当前的全局变量 执行结果: {‘__package__‘: None, ‘__loader__‘: <_frozen_importlib_external.SourceFileLoader object at 0x01035A70>, ‘__cached__‘: None, ‘__name__‘: ‘__main__‘, ‘__spec__‘: None, ‘__builtins__‘: <module ‘builtins‘ (built-in)>, ‘__doc__‘: None, ‘__file__‘: ‘D:/untitled/python5/python5/index.py‘}
__builtins__存放内置函数
__doc__ 对.py文件的注释
__package__ 当前.py所在的文件夹用 . 划分
导入的其他文件:指定文件文件夹所在包用.划分
__file__ 文件本身自己的路径
__cached__ 缓存
__name__ ==‘ __main__‘ 只有执行python 主.py文件时 __name__ ==‘ __main__‘ 否则 __name__ = 模块名
主文件调用主函数前必须加 if __name__ ==‘ __main__‘ 判断
""" 我是index.py文件的注释 """ print(__doc__) 执行结果: 我是index.py文件的注释 print(__file__) 执行结果: D:/untitled/python5/python5/s1.py
import os import sys x1 = os.path.dirname(__file__) # 返回上一级目录 x2 = ‘bin‘ xin = os.path.join(x1, x2) sys.path.append(xin) for i in sys.path: print(i) 执行结果: D:\untitled\python5\lib D:\untitled D:\python3.5\python35.zip D:\python3.5\DLLs D:\python3.5\lib D:\python3.5 D:\python3.5\lib\site-packages D:/untitled/python5/lib\bin
from lib.xx import commons #进入lib文件夹下的xx文件导入commons文件 print(commons.__package__)#文件夹套文件夹用 . 来区分 执行结果: lib.xx from lib.xx import commons #进入lib文件夹下的xx文件导入commons文件 print(commons.__cached__) #pyton3里有python2 里没有 用来缓存一个.pyc文件 #print(__name__) #如果执行当前的主文件.py__name__就等于__main__ 在其他文件或者导入的__name__就是本身文件的名字 from lib.xx import commons print(commons.__name__) 执行结果: __main__ lib.xx.commons
模块:
urllib:request
发起http请求,获取请求返回值
import urllib #打开一个网址 发送http请求, from urllib import request r = urllib.request.urlopen(‘http://www.webxml.com.cn//webservices/qqOnlineWebService.asmx/qqCheckOnline?qqCode=78956745465‘) result = r.read().decode(‘utf-8‘) #读取打开网址给的返回值 解码成utf-8 print(result)
json 和 pickle
用于序列化的两个模块
- json,用于字符串 和 python数据类型间进行转换
- pickle,用于python特有的类型 和 python的数据类型间进行转换
Json模块提供了四个功能:dumps、dump、loads、load
pickle模块提供了四个功能:dumps、dump、loads、load
s = ‘{"desc":"invilad-citykey","status":1002}‘ 用于字符串 和 python数据类型间进行转换 字典类型里如果是字符串必须是双引号 l = ‘[11,22,33,44]‘ import json ==字符串, result = json.loads(s) #loads 将字符串转换为数据基本类型 print(result,type(result)) ss = json.loads(l) print(ss,type(ss)) xin = [‘xin‘,‘xin1‘,‘xin2‘] ss = json.dumps(xin) #将python基本数据类型,转换为字符串 print(ss,type(ss)) dic = {‘k1‘: 123,‘k2‘:‘v2‘} z = json.dump(dic,open(‘db‘,‘w‘)) # 将字符串转换为字典 在字典写入文件 print(z) r = json.load(open(‘db‘,‘r‘)) #将字符串转换为字典读取文件里的字典 类型 print(r,type(r))
第三方模块:
有两种安装方式:
一 ,在管理工具中安装
软件管理工具 pip3路径添加到环境变量
pip3路径: C:\Python35\Scripts
添加环境变量:【右键计算机】--》【属性】--》【高级系统设置】--》【高级】--》【环境变量】--》【在第二个内容框中找到 变量名为Path 的一行,双击】 --> 【Python安装目录追加到变值值中,用 ; 分割】、
二,源码安装下载代码,安装
1 先下载
https://github.com/kennethreitz/requests/tarball/master
2、解压
3、
进入目录
4、执行python setup.py install
requests模块已经将常用的Http请求方法为用户封装完成,用户直接调用其提供的相应方法即可,其中方法的所有参数有:
def request(method, url, **kwargs): """Constructs and sends a :class:`Request <Request>`. :param method: method for the new :class:`Request` object. :param url: URL for the new :class:`Request` object. :param params: (optional) Dictionary or bytes to be sent in the query string for the :class:`Request`. :param data: (optional) Dictionary, bytes, or file-like object to send in the body of the :class:`Request`. :param json: (optional) json data to send in the body of the :class:`Request`. :param headers: (optional) Dictionary of HTTP Headers to send with the :class:`Request`. :param cookies: (optional) Dict or CookieJar object to send with the :class:`Request`. :param files: (optional) Dictionary of ``‘name‘: file-like-objects`` (or ``{‘name‘: (‘filename‘, fileobj)}``) for multipart encoding upload. :param auth: (optional) Auth tuple to enable Basic/Digest/Custom HTTP Auth. :param timeout: (optional) How long to wait for the server to send data before giving up, as a float, or a :ref:`(connect timeout, read timeout) <timeouts>` tuple. :type timeout: float or tuple :param allow_redirects: (optional) Boolean. Set to True if POST/PUT/DELETE redirect following is allowed. :type allow_redirects: bool :param proxies: (optional) Dictionary mapping protocol to the URL of the proxy. :param verify: (optional) whether the SSL cert will be verified. A CA_BUNDLE path can also be provided. Defaults to ``True``. :param stream: (optional) if ``False``, the response content will be immediately downloaded. :param cert: (optional) if String, path to ssl client cert file (.pem). If Tuple, (‘cert‘, ‘key‘) pair. :return: :class:`Response <Response>` object :rtype: requests.Response Usage:: >>> import requests >>> req = requests.request(‘GET‘, ‘http://httpbin.org/get‘) <Response [200]> """ # By using the ‘with‘ statement we are sure the session is closed, thus we # avoid leaving sockets open which can trigger a ResourceWarning in some # cases, and look like a memory leak in others. with sessions.Session() as session: return session.request(method=method, url=url, **kwargs) 更多参数
第三方模块 requests 用于发送http请求 (用py模拟浏览器浏览网页) requests.get ("http://www.baidu.com") 可以发送一个http请求 发送成功接收一个返回值 这个返回值就是个字符串 import requests response = requests.get("http://www.weather.com.cn/adat/sk/101010500.html") response.encoding = ‘utf-8‘ xin = response.text # text 表示返回的内容 print(xin)
XML是实现不同语言或程序之间进行数据交换的协议,XML文件格式如下:
Element类的方法:
# tag #根节点返回节点的标签名
# attrib #标签属性
# find #查找
# iter #返回匹配到的元素的迭代器 用于找到某一类节点并去循环
# set #修改属性
# get #获取属性
#text 返回的内容
ElenmentTree类的方法:
获取xml文件的根节点通过getroot获取根节点
1.tag获取节点的标签名
2.attrib 获取节点的属性
3 text 获取标签的内容
<data> <country name="Liechtenstein"> <rank updated="yes">2</rank> <year>2023</year> <gdppc>141100</gdppc> <neighbor direction="E" name="Austria" /> <neighbor direction="W" name="Switzerland" /> </country> <country name="Singapore"> <rank updated="yes">5</rank> <year>2026</year> <gdppc>59900</gdppc> <neighbor direction="N" name="Malaysia" /> </country> <country name="Panama"> <rank updated="yes">69</rank> <year>2026</year> <gdppc>13600</gdppc> <neighbor direction="W" name="Costa Rica" /> <neighbor direction="E" name="Colombia" /> </country> </data>
1,解析xml
解析xml 有两种解析方式
一种将字符串解析成xml格式
一种直接将文件解析成xml格式
from xml.etree import ElementTree as ET #利用ElementTree 模块下的xml 方法可以把一个字符串类型的转换为Element类。从而从而利用Element类下面的方法 # xml(字符串)解析方式只能读不能写
str_xml = open(‘db.xml‘,‘r‘).read() #打开文件,读取xml内容 接收到一个字符串 print(str_xml) root = ET.XML(str_xml) #将字符串解析成xml特殊的对象,toot就是特殊对象 他代指xml文件的节点 print(root)
from xml.etree import ElementTree as ET root = ET.XML(open(‘db.xml‘,‘r‘,encoding=‘utf-8‘).read()) print(root.tag) #获取文件的顶层标签 print(dir(root))
另一种方式 tree = ET.parse(‘db.xml‘) #直接将文件解析成xml格式 类型是ElementTree print(tree) root = tree.getroot() #获取到xml文件的节点 print(root)
2,操作xml
xml 格式类型是节点套节点,每一个节点都有不同的功能 以便对当前的节点进行操作
class Element: """An XML element. This class is the reference implementation of the Element interface. An element‘s length is its number of subelements. That means if you want to check if an element is truly empty, you should check BOTH its length AND its text attribute. The element tag, attribute names, and attribute values can be either bytes or strings. *tag* is the element name. *attrib* is an optional dictionary containing element attributes. *extra* are additional element attributes given as keyword arguments. Example form: <tag attrib>text<child/>...</tag>tail """ 当前节点的标签名 tag = None """The element‘s name.""" 当前节点的属性 attrib = None """Dictionary of the element‘s attributes.""" 当前节点的内容 text = None """ Text before first subelement. This is either a string or the value None. Note that if there is no text, this attribute may be either None or the empty string, depending on the parser. """ tail = None """ Text after this element‘s end tag, but before the next sibling element‘s start tag. This is either a string or the value None. Note that if there was no text, this attribute may be either None or an empty string, depending on the parser. """ def __init__(self, tag, attrib={}, **extra): if not isinstance(attrib, dict): raise TypeError("attrib must be dict, not %s" % ( attrib.__class__.__name__,)) attrib = attrib.copy() attrib.update(extra) self.tag = tag self.attrib = attrib self._children = [] def __repr__(self): return "<%s %r at %#x>" % (self.__class__.__name__, self.tag, id(self)) def makeelement(self, tag, attrib): 创建一个新节点 """Create a new element with the same type. *tag* is a string containing the element name. *attrib* is a dictionary containing the element attributes. Do not call this method, use the SubElement factory function instead. """ return self.__class__(tag, attrib) def copy(self): """Return copy of current element. This creates a shallow copy. Subelements will be shared with the original tree. """ elem = self.makeelement(self.tag, self.attrib) elem.text = self.text elem.tail = self.tail elem[:] = self return elem def __len__(self): return len(self._children) def __bool__(self): warnings.warn( "The behavior of this method will change in future versions. " "Use specific ‘len(elem)‘ or ‘elem is not None‘ test instead.", FutureWarning, stacklevel=2 ) return len(self._children) != 0 # emulate old behaviour, for now def __getitem__(self, index): return self._children[index] def __setitem__(self, index, element): # if isinstance(index, slice): # for elt in element: # assert iselement(elt) # else: # assert iselement(element) self._children[index] = element def __delitem__(self, index): del self._children[index] def append(self, subelement): 为当前节点追加一个子节点 """Add *subelement* to the end of this element. The new element will appear in document order after the last existing subelement (or directly after the text, if it‘s the first subelement), but before the end tag for this element. """ self._assert_is_element(subelement) self._children.append(subelement) def extend(self, elements): 为当前节点扩展 n 个子节点 """Append subelements from a sequence. *elements* is a sequence with zero or more elements. """ for element in elements: self._assert_is_element(element) self._children.extend(elements) def insert(self, index, subelement): 在当前节点的子节点中插入某个节点,即:为当前节点创建子节点,然后插入指定位置 """Insert *subelement* at position *index*.""" self._assert_is_element(subelement) self._children.insert(index, subelement) def _assert_is_element(self, e): # Need to refer to the actual Python implementation, not the # shadowing C implementation. if not isinstance(e, _Element_Py): raise TypeError(‘expected an Element, not %s‘ % type(e).__name__) def remove(self, subelement): 在当前节点在子节点中删除某个节点 """Remove matching subelement. Unlike the find methods, this method compares elements based on identity, NOT ON tag value or contents. To remove subelements by other means, the easiest way is to use a list comprehension to select what elements to keep, and then use slice assignment to update the parent element. ValueError is raised if a matching element could not be found. """ # assert iselement(element) self._children.remove(subelement) def find(self, path, namespaces=None): 获取第一个寻找到的子节点 """Find first matching element by tag name or path. *path* is a string having either an element tag or an XPath, *namespaces* is an optional mapping from namespace prefix to full name. Return the first matching element, or None if no element was found. """ return ElementPath.find(self, path, namespaces) def findtext(self, path, default=None, namespaces=None): 获取第一个寻找到的子节点的内容 """Find text for first matching element by tag name or path. *path* is a string having either an element tag or an XPath, *default* is the value to return if the element was not found, *namespaces* is an optional mapping from namespace prefix to full name. Return text content of first matching element, or default value if none was found. Note that if an element is found having no text content, the empty string is returned. """ return ElementPath.findtext(self, path, default, namespaces) def findall(self, path, namespaces=None): 获取所有的子节点 """Find all matching subelements by tag name or path. *path* is a string having either an element tag or an XPath, *namespaces* is an optional mapping from namespace prefix to full name. Returns list containing all matching elements in document order. """ return ElementPath.findall(self, path, namespaces) def iterfind(self, path, namespaces=None): 获取所有指定的节点,并创建一个迭代器(可以被for循环) """Find all matching subelements by tag name or path. *path* is a string having either an element tag or an XPath, *namespaces* is an optional mapping from namespace prefix to full name. Return an iterable yielding all matching elements in document order. """ return ElementPath.iterfind(self, path, namespaces) def clear(self): 清空节点 """Reset element. This function removes all subelements, clears all attributes, and sets the text and tail attributes to None. """ self.attrib.clear() self._children = [] self.text = self.tail = None def get(self, key, default=None): 获取当前节点的属性值 """Get element attribute. Equivalent to attrib.get, but some implementations may handle this a bit more efficiently. *key* is what attribute to look for, and *default* is what to return if the attribute was not found. Returns a string containing the attribute value, or the default if attribute was not found. """ return self.attrib.get(key, default) def set(self, key, value): 为当前节点设置属性值 """Set element attribute. Equivalent to attrib[key] = value, but some implementations may handle this a bit more efficiently. *key* is what attribute to set, and *value* is the attribute value to set it to. """ self.attrib[key] = value def keys(self): 获取当前节点的所有属性的 key """Get list of attribute names. Names are returned in an arbitrary order, just like an ordinary Python dict. Equivalent to attrib.keys() """ return self.attrib.keys() def items(self): 获取当前节点的所有属性值,每个属性都是一个键值对 """Get element attributes as a sequence. The attributes are returned in arbitrary order. Equivalent to attrib.items(). Return a list of (name, value) tuples. """ return self.attrib.items() def iter(self, tag=None): 在当前节点的子孙中根据节点名称寻找所有指定的节点,并返回一个迭代器(可以被for循环)。 """Create tree iterator. The iterator loops over the element and all subelements in document order, returning all elements with a matching tag. If the tree structure is modified during iteration, new or removed elements may or may not be included. To get a stable set, use the list() function on the iterator, and loop over the resulting list. *tag* is what tags to look for (default is to return all elements) Return an iterator containing all the matching elements. """ if tag == "*": tag = None if tag is None or self.tag == tag: yield self for e in self._children: yield from e.iter(tag) # compatibility def getiterator(self, tag=None): # Change for a DeprecationWarning in 1.4 warnings.warn( "This method will be removed in future versions. " "Use ‘elem.iter()‘ or ‘list(elem.iter())‘ instead.", PendingDeprecationWarning, stacklevel=2 ) return list(self.iter(tag)) def itertext(self): 在当前节点的子孙中根据节点名称寻找所有指定的节点的内容,并返回一个迭代器(可以被for循环)。 """Create text iterator. The iterator loops over the element and all subelements in document order, returning all inner text. """ tag = self.tag if not isinstance(tag, str) and tag is not None: return if self.text: yield self.text for e in self: yield from e.itertext() if e.tail: yield e.tail
由于 每个节点 都具有以上的方法,并且在上一步骤中解析时均得到了root(xml文件的根节点),so 可以利用以上方法进行操作xml文件。
遍历XML文档的所有内容
from xml.etree import ElementTree as ET root = ET.XML(open(‘db.xml‘,‘r‘,encoding=‘utf-8‘).read()) print(root.tag) #获取文件的顶层标签 for node in root: 循环遍历xml文档的第二层节点 print(node.tag,node.attrib) #获取标签名,及属性 for i in node: # 循环遍历第三层 print(i.tag,i.text) #循环遍历xml文件的第三层标签名和内容
遍历xml指定的节点内容
指定xml文件的节点内容 from xml.etree import ElementTree as ET root = ET.XML(open(‘db.xml‘,‘r‘,encoding=‘utf-8‘).read()) print(root.tag) #获取文件的顶层标签 for node in root: print(node.tag,node.attrib,node.find(‘year‘).text) #获取指定的的节点的标签名及内容 from xml.etree import ElementTree as ET root = ET.XML(open(‘db.xml‘,‘r‘,encoding=‘utf-8‘).read()) print(root.tag) #获取文件的顶层标签 for node in root: print(node.tag,node.attrib,node.find(‘rank‘).text) #获取指定的的节点的标签名及内容
3,修改节点内容
由于修改的节点时,均是在内存中进行,其不会影响文件中的内容。所以,如果想要修改,则需要重新将内存中的内容写到文件。
# 进入xml文件下的etree文件导入ElementTree模块 将别名赋给ET from xml.etree import ElementTree as ET str_xml = open(‘db.xml‘,‘r‘).read() #打开文件读取xml文件内容 root = ET.XML(str_xml) #将接受到的字符串解析成xml格式的特殊对象, print(root) #获取顶层的标签 for node in root.iter(‘year‘): # 循环遍历所有的year节点 new_year = int(node.text)+1 #并且将year节点的内容每次有加一 node.text = str(new_year) node.set(‘name‘,‘kaixin‘) #给当前节点设置属性 node.set(‘age‘,‘20‘) del node.attrib[‘name‘] #删除当前节点下的name属性 tree = ET.ElementTree(root) tree.write(‘xxx.xml‘,encoding=‘utf-8‘) #写入并保存文件
另一种方式 打开文件并解析成xml格式 from xml.etree import ElementTree as ET tree = ET.parse(‘db.xml‘) #直接解析xml文件 root = tree.getroot() print(root.tag) for node in root.iter(‘year‘): new_year = int(node.text) + 1 node.text = str(new_year) node.set(‘name‘,‘kaixin‘) node.set(‘age‘,‘20‘) del node.attrib[‘name‘] tree.write(‘zzz.xml‘,encoding=‘utf-8‘)
删除节点
from xml.etree import ElementTree as ET tree = ET.parse(‘db.xml‘) #打开xml 文件 root = tree.getroot() #获取到根节点 for country in root.findall(‘country‘): #循环遍历所有的 countrt rank = int(country.find(‘rank‘).text) #获取每一个country节点下的节点内容 if rank > 50: # 只要rank大于50 root.remove(country) #删除指定的country节点内容 tree.write(‘xxx.xml‘,encoding=‘utf-8‘)
创建节点
tree
1 ElementTree 类创建 ElementTree(xxx)
root Element类创建的对象
# print(root.tag) # print(root.attrib)
2 getroot() 获取xml跟节点
3. write() 内存中的xml写入文件中
from xml.etree import ElementTree as ET tree = ET.parse(‘db.xml‘) #直接解析xml文件 # tree, 用ElementTree 创建的 #ElemenTree(xxx)创建 getroot ()获取xml根节点 weite()内存中的xml写入文件 root = tree.getroot() #获取xml文件的根节点,Element类型 print(root.tag) # #root Element类创建的对象 # 创建节点 son = root.makeelement(‘tt‘,{‘kk‘:‘vv‘}) 标签名 及属性 s = son.makeelement(‘tt‘,{‘kk‘:‘123456‘}) son.append(s) root.append(son) tree.write(‘db.xml.xml‘)
from xml.etree import ElementTree as ET tree = ET.parse(‘db.xml‘) #直接解析xml文件 root = tree.getroot() #获取xml文件的根节点,Element类型 son = ET.Element(‘PP‘,{‘kk‘:‘vv‘}) 直接通过Element 方法创建节点 ele2 = ET.Element(‘pp‘,{‘kk‘:‘123455‘}) son.append(ele2) 将第二层的节点追加到第一层的节点下 root.append(son) 将第一层的节点加到根节点下 tree.write(‘db.xml‘) tree。write 将内存中的xml写入文件
configparser用于处理特定格式的文件,其本质上是利用open来操作文件。
import configparser #处理将有效地读取这个配置文件数据 con = configparser.ConfigParser() #con 对象的read功能,打开文件读取文件,放进内容 con.read(‘in‘,encoding=‘utf-8‘) # con对象中的sections 内存中寻找所有的[xxx] xin = con.sections() #获取所有的节点 print(xin) ret = con.items(‘guokaixin‘) #获取所有的键值对 print(ret) ret = con.options(‘guokaixin‘) #获取指定节点下的所有的键 print(ret) ret = con.get(‘guokaixin‘,‘age‘) #获取指定节点下的指定键的值 print(ret) ret1 = con.getint(‘guokaixin‘,‘age‘) print(ret1) ret2 = con.getfloat(‘guokaixin‘,‘age‘) print(ret2) ret3 = con.getboolean(‘guokaixin‘,‘age‘) print(ret3) has_sec = con.has_section(‘guokaixin‘) #检查节点是否存在 print(has_sec) s = con.add_section(‘xinxin‘) #添加节点 a = con.write(open(‘in‘,‘w‘)) print(s) print(a) s = con.remove_section(‘xinxin‘) #删除节点 a = con.write(open(‘in‘,‘w‘)) print(s) print(a) 检查、删除、添加节点 import configparser a = configparser.ConfigParser() s = a.read(‘xxxooo‘, encoding=‘utf-8‘) print(s) # 检查 has_opt = a.has_option(‘section1‘, ‘k1‘) print(has_opt) # 删除 zz = a.remove_option(‘section1‘, ‘k1‘) cc = a.write(open(‘xxxooo‘, ‘w‘)) print(zz) print(cc) # 设置 zz = a.set(‘section1‘, ‘k10‘, "123") cc = a.write(open(‘xxxooo‘, ‘w‘)) print(zz) print(cc)
由于原生保存的XML时默认无缩进,如果想要设置缩进的话, 需要修改保存方式:
from xml.etree import ElementTree as ET from xml.dom import minidom def MyWrite(root,filr_path): rough_string = ET.tostring(root,‘utf-8‘) reparsed = minidom.parseString(rough_string) new_str = reparsed.toprettyxml(indent=‘\t‘) #加了缩进的字符串 f = open(filr_path,‘w‘,encoding=‘utf-8‘) f.write(new_str) f.close()
shutil 对压缩包的处理是调用 ZipFile 和 TarFile 两个模块来进行的,详细:
import zipfile # z = zipfile.ZipFile(‘laxi.zip‘,‘w‘) # z.write(‘in‘) # z.write(‘xo.xml‘) # z.close() import zipfile z = zipfile.ZipFile(‘spiders.zip‘,‘r‘) z.extractall() z.close()