一、定义模块:
模块:用来从逻辑上组织python代码(变量、函数、类、逻辑:实现一个功能),本质就是以.py结尾的python文件(文件名:test.py ,对应的模块名就是test)
包:用来从逻辑上组织模块的,本质就是一个目录(必须带有__init__.py的文件)
二、导入方法:
1、import module_guyun
1 #命名为module_guyun.py 2 #需要导入的模块内容 3 #!/usr/bin/env python 4 # -*- coding: utf-8 -*- 5 # Author :GU 6 name = "guyun" 7 def say_hallo(): 8 print("hello guyun") 9 ######################## 10 #导入模块 11 #!/usr/bin/env python 12 # -*- coding: utf-8 -*- 13 # Author :GU 14 import module_guyun 15 print(module_guyun.name) 16 module_guyun.say_hallo() 17 执行结果: 18 guyun 19 hello guyun
2、from module_alex import logger as logger_guyun #别名
当要导入的模块与本模块命名重复时,别名要导入的模块可以解决这个问题
1 #!/usr/bin/env python 2 # -*- coding: utf-8 -*- 3 # Author :GU 4 name = "guyun" 5 def say_hallo(): 6 print("hello guyun") 7 def logger(): 8 print("in the module_guyun") 9 def running(): 10 pass 11 ################## 12 #!/usr/bin/env python 13 # -*- coding: utf-8 -*- 14 # Author :GU 15 from module_guyun import logger as logger_guyun 16 def logger(): 17 print("in the main") 18 logger() 19 logger_guyun() 20 ##执行结果: 21 in the main 22 in the module_guyun
3、导入一个包实际的本质就是导入一个__init__.py
包package_test里面的init文件
1 #!/usr/bin/env python 2 # -*- coding: utf-8 -*- 3 # Author :GU 4 print("from test package package_test")
现在把package_testp_test文件导入p_test
1 #!/usr/bin/env python 2 # -*- coding: utf-8 -*- 3 # Author :GU 4 import package_test 5 ##执行结果: 6 from test package package_test
4、当文件目录不再同一级目录之后该如何调用
-module_test
-main.py
-module_guyun.py
现在main.py去调用module_guyun.py
1 #module_guyun.py文件 2 #!/usr/bin/env python 3 # -*- coding: utf-8 -*- 4 # Author :GU 5 name = "guyun" 6 def say_hallo(): 7 print("hello guyun") 8 def logger(): 9 print("in the module_guyun") 10 def running(): 11 pass 12 ##main.py文件 13 #!/usr/bin/env python 14 # -*- coding: utf-8 -*- 15 # Author :GU 16 #from module_guyun import logger as logger_guyun 17 import sys,os 18 x = os.path.dirname(os.path.dirname(os.path.abspath(__file__))) 19 #print(os.path.dirname(os.path.dirname(os.path.abspath(__file__)))) 20 sys.path.append(x) 21 import module_guyun 22 module_guyun.say_hallo() 23 module_guyun.logger() 24 #### 25 #执行结果: 26 hello guyun 27 in the module_guyun
5、如何导入一个包
-package_test
-test1.py
-__init__.py
-p_test.py
1 #init文件 2 #!/usr/bin/env python 3 # -*- coding: utf-8 -*- 4 # Author :GU 5 print("from test package package_test") 6 from . import test1 7 #test1文件 8 #!/usr/bin/env python 9 # -*- coding: utf-8 -*- 10 # Author :GU 11 def test(): 12 print("in the test1") 13 ###调用文件 14 #!/usr/bin/env python 15 # -*- coding: utf-8 -*- 16 # Author :GU 17 import package_test ###执行init.py文件 18 package_test.test1.test() 19 #执行结果: 20 from test package package_test 21 in the test1 22 ####达到的目的就是在同一级目录倒入一个包的文件,中间通过init文件调度
总结
- import module_alex
- import module_alex,module2_alex #调用多个模块
- for module_alex import * ###不建议用
- from module_alex import m1,m2,m3 ##调用一个模块中的多个小模块
- from module_alex import logger as logger_alex ###别名
三、import本质(路径搜索和搜索路径)
导入模块的本质就是把python文件解释一遍
import moile_name ------->module_name.py ----->module_name.py的路径----->sys.path
导入包的本质就是在执行这个包里面的__init__.py文件
四、导入优化
五、模块的分类
a:标准库(内置)
b:开源模块
c:自定义模块
1、标准库
a、time和datetime
在Python中,通常有这几种方式来表示时间:1)时间戳 2)格式化的时间字符串 3)元组(struct_time)共九个元素。由于Python的time模块实现主要调用C库,所以各个平台可能有所不同。
UTC(Coordinated Universal Time,世界协调时)亦即格林威治天文时间,世界标准时间。在中国为UTC+8。DST(Daylight Saving Time)即夏令时。
时间戳(timestamp)的方式:通常来说,时间戳表示的是从1970年1月1日00:00:00开始按秒计算的偏移量。我们运行“type(time.time())”,返回的是float类型。返回时间戳方式的函数主要有time(),clock()等。
元组(struct_time)方式:struct_time元组共有9个元素,返回struct_time的函数主要有gmtime(),localtime(),strptime()。下面列出这种方式元组中的几个元素:
1)time.localtime([secs]):将一个时间戳转换为当前时区的struct_time。secs参数未提供,则以当前时间为准。
1 >>> time.localtime() 2 time.struct_time(tm_year=2011, tm_mon=5, tm_mday=5, tm_hour=14, tm_min=14, tm_sec=50, tm_wday=3, tm_yday=125, tm_isdst=0) 3 >>> time.localtime(1304575584.1361799) 4 time.struct_time(tm_year=2011, tm_mon=5, tm_mday=5, tm_hour=14, tm_min=6, tm_sec=24, tm_wday=3, tm_yday=125, tm_isdst=0)
2)time.gmtime([secs]):和localtime()方法类似,gmtime()方法是将一个时间戳转换为UTC时区(0时区)的struct_time。
1 >>>time.gmtime() 2 time.struct_time(tm_year=2011, tm_mon=5, tm_mday=5, tm_hour=6, tm_min=19, tm_sec=48, tm_wday=3, tm_yday=125, tm_isdst=0)
3)time.time():返回当前时间的时间戳。
1 >>> time.time() 2 1304575584.1361799
4)time.mktime(t):将一个struct_time转化为时间戳。
1 >>> time.mktime(time.localtime()) 2 1304576839.0
5)time.sleep(secs):线程推迟指定的时间运行。单位为秒。
6)time.clock():这个需要注意,在不同的系统上含义不同。 在UNIX系统上,它返回的是“进程时间”,它是用秒表示的浮点数(时间戳)。而在WINDOWS中,第一次调用,返回的是进程运行的实际时间。而第二次 之后的调用是自第一次调用以后到现在的运行时间。(实际上是以WIN32上QueryPerformanceCounter()为基础,它比毫秒表示更为 精确)
1 import time 2 if __name__ == ‘__main__‘: 3 time.sleep(1) 4 print "clock1:%s" % time.clock() 5 time.sleep(1) 6 print "clock2:%s" % time.clock() 7 time.sleep(1) 8 print "clock3:%s" % time.clock()
执行结果:
1 clock1:3.35238137808e-006 2 clock2:1.00004944763 3 clock3:2.00012040636
其中第一个clock()输出的是程序运行时间
第二、三个clock()输出的都是与第一个clock的时间间隔
7)time.asctime([t]):把一个表示时间的元组或者struct_time表示为这种形式:‘Sun Jun 20 23:21:05 1993‘。如果没有参数,将会将time.localtime()作为参数传入。
1 >>> time.asctime() 2 ‘Thu May 5 14:55:43 2011‘
8)time.ctime([secs]):把一个时间戳(按秒计算的浮点数)转化为time.asctime()的形式。如果参数未给或者为None的时候,将会默认time.time()为参数。它的作用相当于time.asctime(time.localtime(secs))。
1 >>> time.ctime() 2 ‘Thu May 5 14:58:09 2011‘ 3 >>> time.ctime(time.time()) 4 ‘Thu May 5 14:58:39 2011‘ 5 >>> time.ctime(1304579615) 6 ‘Thu May 5 15:13:35 2011‘
9)time.strftime(format[, t]): 把一个代表时间的元组或者struct_time(如由time.localtime()和time.gmtime()返回)转化为格式化的时间字符串。 如果t未指定,将传入time.localtime()。如果元组中任何一个元素越界,ValueError的错误将会被抛出。
1 >>> time.strftime("%Y-%m-%d %X", time.localtime()) 2 ‘2011-05-05 16:37:06‘
10)time.strptime(string[, format]):把一个格式化时间字符串转化为struct_time。实际上它和strftime()是逆操作。
1 >>> time.strptime(‘2011-05-05 16:37:06‘, ‘%Y-%m-%d %X‘) 2 time.struct_time(tm_year=2011, tm_mon=5, tm_mday=5, tm_hour=16, tm_min=37, tm_sec=6, tm_wday=3, tm_yday=125, tm_isdst=-1)
在这个函数中,format默认为:"%a %b %d %H:%M:%S %Y"。
最后,我们来对time模块进行一个总结。根据之前描述,在Python中共有三种表达方式:1)timestamp 2)tuple或者struct_time 3)格式化字符串。
时间转换关系
1 #_*_coding:utf-8_*_ 2 import time 3 # print(time.clock()) #返回处理器时间,3.3开始已废弃 , 改成了time.process_time()测量处理器运算时间,不包括sleep时间,不稳定,mac上测不出来 4 # print(time.altzone) #返回与utc时间的时间差,以秒计算\ 5 # print(time.asctime()) #返回时间格式"Fri Aug 19 11:14:16 2016", 6 # print(time.localtime()) #返回本地时间 的struct time对象格式 7 # print(time.gmtime(time.time()-800000)) #返回utc时间的struc时间对象格式 当你没插入值的时候,,默认传入你当前时间,返回标准时间第一时区 8 9 # print(time.asctime(time.localtime())) #返回时间格式"Fri Aug 19 11:14:16 2016", 10 #print(time.ctime()) #返回Fri Aug 19 12:38:29 2016 格式, 同上 11 12 # 日期字符串 转成 时间戳 13 # string_2_struct = time.strptime("2016/05/22","%Y/%m/%d") #将 日期字符串 转成 struct时间对象格式 14 # print(string_2_struct) 16 # struct_2_stamp = time.mktime(string_2_struct) #将struct时间对象转成时间戳 17 # print(struct_2_stamp) 18 19 #将时间戳转为字符串格式 20 # print(time.gmtime(time.time()-86640)) #将utc时间戳转换成struct_time格式 21 # print(time.strftime("%Y-%m-%d %H:%M:%S",time.gmtime()) ) #将utc struct_time格式转成指定的字符串格式 22 23 #时间加减 24 import datetime 25 # print(datetime.datetime.now()) #返回 2016-08-19 12:47:03.941925 26 #print(datetime.date.fromtimestamp(time.time()) ) # 时间戳直接转成日期格式 2016-08-19 27 # print(datetime.datetime.now() ) 获取当前时间 28 # print(datetime.datetime.now() + datetime.timedelta(3)) #当前时间+3天 29 # print(datetime.datetime.now() + datetime.timedelta(-3)) #当前时间-3天 30 # print(datetime.datetime.now() + datetime.timedelta(hours=3)) #当前时间+3小时 31 # print(datetime.datetime.now() + datetime.timedelta(minutes=30)) #当前时间+30分 32 # c_time = datetime.datetime.now() 33 # print(c_time.replace(minute=3,hour=2)) #时间替换 34 ####################格式参照#################### 35 %a 本地(locale)简化星期名称 36 %A 本地完整星期名称 37 %b 本地简化月份名称 38 %B 本地完整月份名称 39 %c 本地相应的日期和时间表示 40 %d 一个月中的第几天(01 - 31) 41 %H 一天中的第几个小时(24小时制,00 - 23) 42 %I 第几个小时(12小时制,01 - 12) 43 %j 一年中的第几天(001 - 366) 44 %m 月份(01 - 12) 45 %M 分钟数(00 - 59) 46 %p 本地am或者pm的相应符 一 47 %S 秒(01 - 61) 二 48 %U 一年中的星期数。(00 - 53星期天是一个星期的开始。)第一个星期天之前的所有天数都放在第0周。 三 49 %w 一个星期中的第几天(0 - 6,0是星期天) 三 50 %W 和%U基本相同,不同的是%W以星期一为一个星期的开始。 51 %x 本地相应日期 52 %X 本地相应时间 53 %y 去掉世纪的年份(00 - 99) 54 %Y 完整的年份 55 %Z 时区的名字(如果不存在为空字符) 56 %% ‘%’字符
##执行结果:
1 3.9473128470428115e-07 2 -32400 3 Tue Aug 23 15:21:55 2016 4 time.struct_time(tm_year=2016, tm_mon=8, tm_mday=23, tm_hour=15, tm_min=21, tm_sec=55, tm_wday=1, tm_yday=236, tm_isdst=0) 5 time.struct_time(tm_year=2016, tm_mon=8, tm_mday=14, tm_hour=1, tm_min=8, tm_sec=35, tm_wday=6, tm_yday=227, tm_isdst=0) 6 Tue Aug 23 15:21:55 2016 7 Tue Aug 23 15:21:55 2016 8 time.struct_time(tm_year=2016, tm_mon=5, tm_mday=22, tm_hour=0, tm_min=0, tm_sec=0, tm_wday=6, tm_yday=143, tm_isdst=-1) 9 1463846400.0 10 time.struct_time(tm_year=2016, tm_mon=8, tm_mday=22, tm_hour=7, tm_min=17, tm_sec=55, tm_wday=0, tm_yday=235, tm_isdst=0) 11 2016-08-23 07:21:55 12 2016-08-23 15:21:55.438771 13 2016-08-23 14 2016-08-23 15:21:55.438771 15 2016-08-26 15:21:55.438771 16 2016-08-20 15:21:55.438771 17 2016-08-23 18:21:55.438771 18 2016-08-23 15:51:55.438771 19 2016-08-23 02:03:55.438771
执行结果:一一对应
b、random模块
1 #!/usr/bin/env python 2 #_*_encoding: utf-8_*_ 3 import random 4 print (random.random()) #0.6445010863311293 5 #random.random()用于生成一个0到1的随机符点数: 0 <= n < 1.0 6 print (random.randint(1,7)) #4 7 #random.randint()的函数原型为:random.randint(a, b),用于生成一个指定范围内的整数。 8 # 其中参数a是下限,参数b是上限,生成的随机数n: a <= n <= b 9 print (random.randrange(1,10)) #5 10 #random.randrange的函数原型为:random.randrange([start], stop[, step]), 11 # 从指定范围内,按指定基数递增的集合中 获取一个随机数。如:random.randrange(10, 100, 2), 12 # 结果相当于从[10, 12, 14, 16, ... 96, 98]序列中获取一个随机数。 13 # random.randrange(10, 100, 2)在结果上与 random.choice(range(10, 100, 2) 等效。 14 print(random.choice(‘liukuni‘)) #i 15 #random.choice从序列中获取一个随机元素。 16 # 其函数原型为:random.choice(sequence)。参数sequence表示一个有序类型。 17 # 这里要说明一下:sequence在python不是一种特定的类型,而是泛指一系列的类型。 18 # list, tuple, 字符串都属于sequence。有关sequence可以查看python手册数据模型这一章。 19 # 下面是使用choice的一些例子: 20 print(random.choice("学习Python"))#学 21 print(random.choice(["JGood","is","a","handsome","boy"])) #List 22 print(random.choice(("Tuple","List","Dict"))) #List 23 print(random.sample([1,2,3,4,5],3)) #[1, 2, 5] 24 #random.sample的函数原型为:random.sample(sequence, k),从指定序列中随机获取指定长度的片断。sample函数不会修改原有序列。
实际应用
1 #!/usr/bin/env python 2 # encoding: utf-8 3 import random 4 import string 5 #随机整数: 6 print( random.randint(0,99)) #70 7 8 #随机选取0到100间的偶数: 9 print(random.randrange(0, 101, 2)) #4 10 11 #随机浮点数: 12 print( random.random()) #0.2746445568079129 13 print(random.uniform(1, 10)) #9.887001463194844 14 15 #随机字符: 16 print(random.choice(‘abcdefg&#%^*f‘)) #f 17 18 #多个字符中选取特定数量的字符: 19 print(random.sample(‘abcdefghij‘,3)) #[‘f‘, ‘h‘, ‘d‘] 20 21 #随机选取字符串: 22 print( random.choice ( [‘apple‘, ‘pear‘, ‘peach‘, ‘orange‘, ‘lemon‘] )) #apple 23 #洗牌# 24 items = [1,2,3,4,5,6,7] 25 print(items) #[1, 2, 3, 4, 5, 6, 7] 26 random.shuffle(items) 27 print(items) #[1, 4, 7, 2, 5, 3, 6]
生产随机验证码
1 import random 2 checkcode = ‘‘ 3 for i in range(4): 4 current = random.randrange(0,4) 5 if current != i: 6 temp = chr(random.randint(65,90)) 7 else: 8 temp = random.randint(0,9) 9 checkcode += str(temp) 10 print (checkcode)
c、os模块
提供对操作系统进行调用的接口
1 os.getcwd() 获取当前工作目录,即当前python脚本工作的目录路径 2 os.chdir("dirname") 改变当前脚本工作目录;相当于shell下cd 3 os.curdir 返回当前目录: (‘.‘) 4 os.pardir 获取当前目录的父目录字符串名:(‘..‘) 5 os.makedirs(‘dirname1/dirname2‘) 可生成多层递归目录 6 os.removedirs(‘dirname1‘) 若目录为空,则删除,并递归到上一级目录,如若也为空,则删除,依此类推 7 os.mkdir(‘dirname‘) 生成单级目录;相当于shell中mkdir dirname 8 os.rmdir(‘dirname‘) 删除单级空目录,若目录不为空则无法删除,报错;相当于shell中rmdir dirname 9 os.listdir(‘dirname‘) 列出指定目录下的所有文件和子目录,包括隐藏文件,并以列表方式打印 10 os.remove() 删除一个文件 11 os.rename("oldname","newname") 重命名文件/目录 12 os.stat(‘path/filename‘) 获取文件/目录信息 13 os.sep 输出操作系统特定的路径分隔符,win下为"\\",Linux下为"/" 14 os.linesep 输出当前平台使用的行终止符,win下为"\r\n",Linux下为"\n" 15 os.pathsep 输出用于分割文件路径的字符串 16 os.name 输出字符串指示当前使用平台。win->‘nt‘; Linux->‘posix‘ 17 os.system("bash command") 运行shell命令,直接显示 18 os.environ 获取系统环境变量 19 os.path.abspath(path) 返回path规范化的绝对路径 20 os.path.split(path) 将path分割成目录和文件名二元组返回 21 os.path.dirname(path) 返回path的目录。其实就是os.path.split(path)的第一个元素 22 os.path.basename(path) 返回path最后的文件名。如何path以/或\结尾,那么就会返回空值。即os.path.split(path)的第二个元素 23 os.path.exists(path) 如果path存在,返回True;如果path不存在,返回False 24 os.path.isabs(path) 如果path是绝对路径,返回True 25 os.path.isfile(path) 如果path是一个存在的文件,返回True。否则返回False 26 os.path.isdir(path) 如果path是一个存在的目录,则返回True。否则返回False 27 os.path.join(path1[, path2[, ...]]) 将多个路径组合后返回,第一个绝对路径之前的参数将被忽略 28 os.path.getatime(path) 返回path所指向的文件或者目录的最后存取时间 29 os.path.getmtime(path) 返回path所指向的文件或者目录的最后修改时间
d、sys模块
1 sys.argv 命令行参数List,第一个元素是程序本身路径,读取参数 2 sys.exit(n) 退出程序,正常退出时exit(0) 3 sys.version 获取Python解释程序的版本信息 4 sys.maxint 最大的Int值 5 sys.path 返回模块的搜索路径,初始化时使用PYTHONPATH环境变量的值 6 sys.platform 返回操作系统平台名称 7 sys.stdout.write(‘please:‘) 8 val = sys.stdin.readline()[:-1]
e、shutil
高级的 文件、文件夹、压缩包 处理模块
1、shutil.copyfileobj(fsrc, fdst[, length])
将文件内容拷贝到另一个文件中,可以部分内容
2、shutil.copyfile(src, dst)
拷贝文件
3、shutil.copymode(src, dst)
仅拷贝权限。内容、组、用户均不变
1 def copymode(src, dst): 2 """Copy mode bits from src to dst""" 3 if hasattr(os, ‘chmod‘): 4 st = os.stat(src) 5 mode = stat.S_IMODE(st.st_mode) 6 os.chmod(dst, mode)
4、shutil.copystat(src, dst)
拷贝状态的信息,包括:mode bits, atime, mtime, flags(要求拷贝的文件必须存在)
修改了修改时间,和访问时间
5、shutil.copy(src, dst)
拷贝文件和权限
1 def copy(src, dst): 2 """Copy data and mode bits ("cp src dst"). 3 4 The destination may be a directory. 5 6 """ 7 if os.path.isdir(dst): 8 dst = os.path.join(dst, os.path.basename(src)) 9 copyfile(src, dst) 10 copymode(src, dst)
6、shutil.copy2(src, dst)
拷贝文件和状态信息
1 def copy2(src, dst): 2 """Copy data and all stat info ("cp -p src dst"). 3 4 The destination may be a directory. 5 6 """ 7 if os.path.isdir(dst): 8 dst = os.path.join(dst, os.path.basename(src)) 9 copyfile(src, dst) 10 copystat(src, dst)
7、shutil.ignore_patterns(*patterns)
shutil.copytree(src, dst, symlinks=False, ignore=None)
递归的去拷贝文件
例如:copytree(source, destination, ignore=ignore_patterns(‘*.pyc‘, ‘tmp*‘))
8、 shutil.rmtree(path[, ignore_errors[, onerror]])
递归的去删除文件
9、shutil.move(src, dst)
递归的去移动文件
10、shutil.make_archive(base_name, format,...)
创建压缩包并返回文件路径,例如:zip、tar
base_name: 压缩包的文件名,也可以是压缩包的路径。只是文件名时,则保存至当前目录,否则保存至指定路径,
如:www =>保存至当前路径
如:/Users/wupeiqi/www =>保存至/Users/wupeiqi/
format: 压缩包种类,“zip”, “tar”, “bztar”,“gztar”
root_dir: 要压缩的文件夹路径(默认当前目录)
owner: 用户,默认当前用户
group: 组,默认当前组
logger: 用于记录日志,通常是logging.Logger对象
1 #将 /Users/wupeiqi/Downloads/test 下的文件打包放置当前程序目录 2 import shutil 3 ret = shutil.make_archive("wwwwwwwwww", ‘gztar‘, root_dir=‘/Users/wupeiqi/Downloads/test‘) 4 #将 /Users/wupeiqi/Downloads/test 下的文件打包放置 /Users/wupeiqi/目录 5 import shutil 6 ret = shutil.make_archive("/Users/wupeiqi/wwwwwwwwww", ‘gztar‘, root_dir=‘/Users/wupeiqi/Downloads/test‘)
shutil 对压缩包的处理是调用 ZipFile 和 TarFile 两个模块来进行的,详细:
①、zipfile
1 import zipfile 2 # 压缩 3 z = zipfile.ZipFile(‘laxi.zip‘, ‘w‘) 4 z.write(‘a.log‘) 5 z.write(‘data.data‘) 6 z.close() 7 # 解压 8 z = zipfile.ZipFile(‘laxi.zip‘, ‘r‘) 9 z.extractall() 10 z.close()
②、tarfile
1 import tarfile 2 # 压缩 3 tar = tarfile.open(‘your.tar‘,‘w‘) 4 tar.add(‘/Users/wupeiqi/PycharmProjects/bbs2.zip‘, arcname=‘bbs2.zip‘) 5 tar.add(‘/Users/wupeiqi/PycharmProjects/cmdb.zip‘, arcname=‘cmdb.zip‘) 6 tar.close() 7 # 解压 8 tar = tarfile.open(‘your.tar‘,‘r‘) 9 tar.extractall() # 可设置解压地址 10 tar.close()
③、ZipFile
1 class ZipFile(object): 2 """ Class with methods to open, read, write, close, list zip files. 3 4 z = ZipFile(file, mode="r", compression=ZIP_STORED, allowZip64=False) 5 6 file: Either the path to the file, or a file-like object. 7 If it is a path, the file will be opened and closed by ZipFile. 8 mode: The mode can be either read "r", write "w" or append "a". 9 compression: ZIP_STORED (no compression) or ZIP_DEFLATED (requires zlib). 10 allowZip64: if True ZipFile will create files with ZIP64 extensions when 11 needed, otherwise it will raise an exception when this would 12 be necessary. 13 14 """ 15 16 fp = None # Set here since __del__ checks it 17 18 def __init__(self, file, mode="r", compression=ZIP_STORED, allowZip64=False): 19 """Open the ZIP file with mode read "r", write "w" or append "a".""" 20 if mode not in ("r", "w", "a"): 21 raise RuntimeError(‘ZipFile() requires mode "r", "w", or "a"‘) 22 23 if compression == ZIP_STORED: 24 pass 25 elif compression == ZIP_DEFLATED: 26 if not zlib: 27 raise RuntimeError, 28 "Compression requires the (missing) zlib module" 29 else: 30 raise RuntimeError, "That compression method is not supported" 31 32 self._allowZip64 = allowZip64 33 self._didModify = False 34 self.debug = 0 # Level of printing: 0 through 3 35 self.NameToInfo = {} # Find file info given name 36 self.filelist = [] # List of ZipInfo instances for archive 37 self.compression = compression # Method of compression 38 self.mode = key = mode.replace(‘b‘, ‘‘)[0] 39 self.pwd = None 40 self._comment = ‘‘ 41 42 # Check if we were passed a file-like object 43 if isinstance(file, basestring): 44 self._filePassed = 0 45 self.filename = file 46 modeDict = {‘r‘ : ‘rb‘, ‘w‘: ‘wb‘, ‘a‘ : ‘r+b‘} 47 try: 48 self.fp = open(file, modeDict[mode]) 49 except IOError: 50 if mode == ‘a‘: 51 mode = key = ‘w‘ 52 self.fp = open(file, modeDict[mode]) 53 else: 54 raise 55 else: 56 self._filePassed = 1 57 self.fp = file 58 self.filename = getattr(file, ‘name‘, None) 59 60 try: 61 if key == ‘r‘: 62 self._RealGetContents() 63 elif key == ‘w‘: 64 # set the modified flag so central directory gets written 65 # even if no files are added to the archive 66 self._didModify = True 67 elif key == ‘a‘: 68 try: 69 # See if file is a zip file 70 self._RealGetContents() 71 # seek to start of directory and overwrite 72 self.fp.seek(self.start_dir, 0) 73 except BadZipfile: 74 # file is not a zip file, just append 75 self.fp.seek(0, 2) 76 77 # set the modified flag so central directory gets written 78 # even if no files are added to the archive 79 self._didModify = True 80 else: 81 raise RuntimeError(‘Mode must be "r", "w" or "a"‘) 82 except: 83 fp = self.fp 84 self.fp = None 85 if not self._filePassed: 86 fp.close() 87 raise 88 89 def __enter__(self): 90 return self 91 92 def __exit__(self, type, value, traceback): 93 self.close() 94 95 def _RealGetContents(self): 96 """Read in the table of contents for the ZIP file.""" 97 fp = self.fp 98 try: 99 endrec = _EndRecData(fp) 100 except IOError: 101 raise BadZipfile("File is not a zip file") 102 if not endrec: 103 raise BadZipfile, "File is not a zip file" 104 if self.debug > 1: 105 print endrec 106 size_cd = endrec[_ECD_SIZE] # bytes in central directory 107 offset_cd = endrec[_ECD_OFFSET] # offset of central directory 108 self._comment = endrec[_ECD_COMMENT] # archive comment 109 110 # "concat" is zero, unless zip was concatenated to another file 111 concat = endrec[_ECD_LOCATION] - size_cd - offset_cd 112 if endrec[_ECD_SIGNATURE] == stringEndArchive64: 113 # If Zip64 extension structures are present, account for them 114 concat -= (sizeEndCentDir64 + sizeEndCentDir64Locator) 115 116 if self.debug > 2: 117 inferred = concat + offset_cd 118 print "given, inferred, offset", offset_cd, inferred, concat 119 # self.start_dir: Position of start of central directory 120 self.start_dir = offset_cd + concat 121 fp.seek(self.start_dir, 0) 122 data = fp.read(size_cd) 123 fp = cStringIO.StringIO(data) 124 total = 0 125 while total < size_cd: 126 centdir = fp.read(sizeCentralDir) 127 if len(centdir) != sizeCentralDir: 128 raise BadZipfile("Truncated central directory") 129 centdir = struct.unpack(structCentralDir, centdir) 130 if centdir[_CD_SIGNATURE] != stringCentralDir: 131 raise BadZipfile("Bad magic number for central directory") 132 if self.debug > 2: 133 print centdir 134 filename = fp.read(centdir[_CD_FILENAME_LENGTH]) 135 # Create ZipInfo instance to store file information 136 x = ZipInfo(filename) 137 x.extra = fp.read(centdir[_CD_EXTRA_FIELD_LENGTH]) 138 x.comment = fp.read(centdir[_CD_COMMENT_LENGTH]) 139 x.header_offset = centdir[_CD_LOCAL_HEADER_OFFSET] 140 (x.create_version, x.create_system, x.extract_version, x.reserved, 141 x.flag_bits, x.compress_type, t, d, 142 x.CRC, x.compress_size, x.file_size) = centdir[1:12] 143 x.volume, x.internal_attr, x.external_attr = centdir[15:18] 144 # Convert date/time code to (year, month, day, hour, min, sec) 145 x._raw_time = t 146 x.date_time = ( (d>>9)+1980, (d>>5)&0xF, d&0x1F, 147 t>>11, (t>>5)&0x3F, (t&0x1F) * 2 ) 148 149 x._decodeExtra() 150 x.header_offset = x.header_offset + concat 151 x.filename = x._decodeFilename() 152 self.filelist.append(x) 153 self.NameToInfo[x.filename] = x 154 155 # update total bytes read from central directory 156 total = (total + sizeCentralDir + centdir[_CD_FILENAME_LENGTH] 157 + centdir[_CD_EXTRA_FIELD_LENGTH] 158 + centdir[_CD_COMMENT_LENGTH]) 159 160 if self.debug > 2: 161 print "total", total 162 163 164 def namelist(self): 165 """Return a list of file names in the archive.""" 166 l = [] 167 for data in self.filelist: 168 l.append(data.filename) 169 return l 170 171 def infolist(self): 172 """Return a list of class ZipInfo instances for files in the 173 archive.""" 174 return self.filelist 175 176 def printdir(self): 177 """Print a table of contents for the zip file.""" 178 print "%-46s %19s %12s" % ("File Name", "Modified ", "Size") 179 for zinfo in self.filelist: 180 date = "%d-%02d-%02d %02d:%02d:%02d" % zinfo.date_time[:6] 181 print "%-46s %s %12d" % (zinfo.filename, date, zinfo.file_size) 182 183 def testzip(self): 184 """Read all the files and check the CRC.""" 185 chunk_size = 2 ** 20 186 for zinfo in self.filelist: 187 try: 188 # Read by chunks, to avoid an OverflowError or a 189 # MemoryError with very large embedded files. 190 with self.open(zinfo.filename, "r") as f: 191 while f.read(chunk_size): # Check CRC-32 192 pass 193 except BadZipfile: 194 return zinfo.filename 195 196 def getinfo(self, name): 197 """Return the instance of ZipInfo given ‘name‘.""" 198 info = self.NameToInfo.get(name) 199 if info is None: 200 raise KeyError( 201 ‘There is no item named %r in the archive‘ % name) 202 203 return info 204 205 def setpassword(self, pwd): 206 """Set default password for encrypted files.""" 207 self.pwd = pwd 208 209 @property 210 def comment(self): 211 """The comment text associated with the ZIP file.""" 212 return self._comment 213 214 @comment.setter 215 def comment(self, comment): 216 # check for valid comment length 217 if len(comment) > ZIP_MAX_COMMENT: 218 import warnings 219 warnings.warn(‘Archive comment is too long; truncating to %d bytes‘ 220 % ZIP_MAX_COMMENT, stacklevel=2) 221 comment = comment[:ZIP_MAX_COMMENT] 222 self._comment = comment 223 self._didModify = True 224 225 def read(self, name, pwd=None): 226 """Return file bytes (as a string) for name.""" 227 return self.open(name, "r", pwd).read() 228 229 def open(self, name, mode="r", pwd=None): 230 """Return file-like object for ‘name‘.""" 231 if mode not in ("r", "U", "rU"): 232 raise RuntimeError, ‘open() requires mode "r", "U", or "rU"‘ 233 if not self.fp: 234 raise RuntimeError, 235 "Attempt to read ZIP archive that was already closed" 236 237 # Only open a new file for instances where we were not 238 # given a file object in the constructor 239 if self._filePassed: 240 zef_file = self.fp 241 should_close = False 242 else: 243 zef_file = open(self.filename, ‘rb‘) 244 should_close = True 245 246 try: 247 # Make sure we have an info object 248 if isinstance(name, ZipInfo): 249 # ‘name‘ is already an info object 250 zinfo = name 251 else: 252 # Get info object for name 253 zinfo = self.getinfo(name) 254 255 zef_file.seek(zinfo.header_offset, 0) 256 257 # Skip the file header: 258 fheader = zef_file.read(sizeFileHeader) 259 if len(fheader) != sizeFileHeader: 260 raise BadZipfile("Truncated file header") 261 fheader = struct.unpack(structFileHeader, fheader) 262 if fheader[_FH_SIGNATURE] != stringFileHeader: 263 raise BadZipfile("Bad magic number for file header") 264 265 fname = zef_file.read(fheader[_FH_FILENAME_LENGTH]) 266 if fheader[_FH_EXTRA_FIELD_LENGTH]: 267 zef_file.read(fheader[_FH_EXTRA_FIELD_LENGTH]) 268 269 if fname != zinfo.orig_filename: 270 raise BadZipfile, 271 ‘File name in directory "%s" and header "%s" differ.‘ % ( 272 zinfo.orig_filename, fname) 273 274 # check for encrypted flag & handle password 275 is_encrypted = zinfo.flag_bits & 0x1 276 zd = None 277 if is_encrypted: 278 if not pwd: 279 pwd = self.pwd 280 if not pwd: 281 raise RuntimeError, "File %s is encrypted, " 282 "password required for extraction" % name 283 284 zd = _ZipDecrypter(pwd) 285 # The first 12 bytes in the cypher stream is an encryption header 286 # used to strengthen the algorithm. The first 11 bytes are 287 # completely random, while the 12th contains the MSB of the CRC, 288 # or the MSB of the file time depending on the header type 289 # and is used to check the correctness of the password. 290 bytes = zef_file.read(12) 291 h = map(zd, bytes[0:12]) 292 if zinfo.flag_bits & 0x8: 293 # compare against the file type from extended local headers 294 check_byte = (zinfo._raw_time >> 8) & 0xff 295 else: 296 # compare against the CRC otherwise 297 check_byte = (zinfo.CRC >> 24) & 0xff 298 if ord(h[11]) != check_byte: 299 raise RuntimeError("Bad password for file", name) 300 301 return ZipExtFile(zef_file, mode, zinfo, zd, 302 close_fileobj=should_close) 303 except: 304 if should_close: 305 zef_file.close() 306 raise 307 308 def extract(self, member, path=None, pwd=None): 309 """Extract a member from the archive to the current working directory, 310 using its full name. Its file information is extracted as accurately 311 as possible. `member‘ may be a filename or a ZipInfo object. You can 312 specify a different directory using `path‘. 313 """ 314 if not isinstance(member, ZipInfo): 315 member = self.getinfo(member) 316 317 if path is None: 318 path = os.getcwd() 319 320 return self._extract_member(member, path, pwd) 321 322 def extractall(self, path=None, members=None, pwd=None): 323 """Extract all members from the archive to the current working 324 directory. `path‘ specifies a different directory to extract to. 325 `members‘ is optional and must be a subset of the list returned 326 by namelist(). 327 """ 328 if members is None: 329 members = self.namelist() 330 331 for zipinfo in members: 332 self.extract(zipinfo, path, pwd) 333 334 def _extract_member(self, member, targetpath, pwd): 335 """Extract the ZipInfo object ‘member‘ to a physical 336 file on the path targetpath. 337 """ 338 # build the destination pathname, replacing 339 # forward slashes to platform specific separators. 340 arcname = member.filename.replace(‘/‘, os.path.sep) 341 342 if os.path.altsep: 343 arcname = arcname.replace(os.path.altsep, os.path.sep) 344 # interpret absolute pathname as relative, remove drive letter or 345 # UNC path, redundant separators, "." and ".." components. 346 arcname = os.path.splitdrive(arcname)[1] 347 arcname = os.path.sep.join(x for x in arcname.split(os.path.sep) 348 if x not in (‘‘, os.path.curdir, os.path.pardir)) 349 if os.path.sep == ‘\\‘: 350 # filter illegal characters on Windows 351 illegal = ‘:<>|"?*‘ 352 if isinstance(arcname, unicode): 353 table = {ord(c): ord(‘_‘) for c in illegal} 354 else: 355 table = string.maketrans(illegal, ‘_‘ * len(illegal)) 356 arcname = arcname.translate(table) 357 # remove trailing dots 358 arcname = (x.rstrip(‘.‘) for x in arcname.split(os.path.sep)) 359 arcname = os.path.sep.join(x for x in arcname if x) 360 361 targetpath = os.path.join(targetpath, arcname) 362 targetpath = os.path.normpath(targetpath) 363 364 # Create all upper directories if necessary. 365 upperdirs = os.path.dirname(targetpath) 366 if upperdirs and not os.path.exists(upperdirs): 367 os.makedirs(upperdirs) 368 369 if member.filename[-1] == ‘/‘: 370 if not os.path.isdir(targetpath): 371 os.mkdir(targetpath) 372 return targetpath 373 374 with self.open(member, pwd=pwd) as source, 375 file(targetpath, "wb") as target: 376 shutil.copyfileobj(source, target) 377 378 return targetpath 379 380 def _writecheck(self, zinfo): 381 """Check for errors before writing a file to the archive.""" 382 if zinfo.filename in self.NameToInfo: 383 import warnings 384 warnings.warn(‘Duplicate name: %r‘ % zinfo.filename, stacklevel=3) 385 if self.mode not in ("w", "a"): 386 raise RuntimeError, ‘write() requires mode "w" or "a"‘ 387 if not self.fp: 388 raise RuntimeError, 389 "Attempt to write ZIP archive that was already closed" 390 if zinfo.compress_type == ZIP_DEFLATED and not zlib: 391 raise RuntimeError, 392 "Compression requires the (missing) zlib module" 393 if zinfo.compress_type not in (ZIP_STORED, ZIP_DEFLATED): 394 raise RuntimeError, 395 "That compression method is not supported" 396 if not self._allowZip64: 397 requires_zip64 = None 398 if len(self.filelist) >= ZIP_FILECOUNT_LIMIT: 399 requires_zip64 = "Files count" 400 elif zinfo.file_size > ZIP64_LIMIT: 401 requires_zip64 = "Filesize" 402 elif zinfo.header_offset > ZIP64_LIMIT: 403 requires_zip64 = "Zipfile size" 404 if requires_zip64: 405 raise LargeZipFile(requires_zip64 + 406 " would require ZIP64 extensions") 407 408 def write(self, filename, arcname=None, compress_type=None): 409 """Put the bytes from filename into the archive under the name 410 arcname.""" 411 if not self.fp: 412 raise RuntimeError( 413 "Attempt to write to ZIP archive that was already closed") 414 415 st = os.stat(filename) 416 isdir = stat.S_ISDIR(st.st_mode) 417 mtime = time.localtime(st.st_mtime) 418 date_time = mtime[0:6] 419 # Create ZipInfo instance to store file information 420 if arcname is None: 421 arcname = filename 422 arcname = os.path.normpath(os.path.splitdrive(arcname)[1]) 423 while arcname[0] in (os.sep, os.altsep): 424 arcname = arcname[1:] 425 if isdir: 426 arcname += ‘/‘ 427 zinfo = ZipInfo(arcname, date_time) 428 zinfo.external_attr = (st[0] & 0xFFFF) << 16L # Unix attributes 429 if compress_type is None: 430 zinfo.compress_type = self.compression 431 else: 432 zinfo.compress_type = compress_type 433 434 zinfo.file_size = st.st_size 435 zinfo.flag_bits = 0x00 436 zinfo.header_offset = self.fp.tell() # Start of header bytes 437 438 self._writecheck(zinfo) 439 self._didModify = True 440 441 if isdir: 442 zinfo.file_size = 0 443 zinfo.compress_size = 0 444 zinfo.CRC = 0 445 zinfo.external_attr |= 0x10 # MS-DOS directory flag 446 self.filelist.append(zinfo) 447 self.NameToInfo[zinfo.filename] = zinfo 448 self.fp.write(zinfo.FileHeader(False)) 449 return 450 451 with open(filename, "rb") as fp: 452 # Must overwrite CRC and sizes with correct data later 453 zinfo.CRC = CRC = 0 454 zinfo.compress_size = compress_size = 0 455 # Compressed size can be larger than uncompressed size 456 zip64 = self._allowZip64 and 457 zinfo.file_size * 1.05 > ZIP64_LIMIT 458 self.fp.write(zinfo.FileHeader(zip64)) 459 if zinfo.compress_type == ZIP_DEFLATED: 460 cmpr = zlib.compressobj(zlib.Z_DEFAULT_COMPRESSION, 461 zlib.DEFLATED, -15) 462 else: 463 cmpr = None 464 file_size = 0 465 while 1: 466 buf = fp.read(1024 * 8) 467 if not buf: 468 break 469 file_size = file_size + len(buf) 470 CRC = crc32(buf, CRC) & 0xffffffff 471 if cmpr: 472 buf = cmpr.compress(buf) 473 compress_size = compress_size + len(buf) 474 self.fp.write(buf) 475 if cmpr: 476 buf = cmpr.flush() 477 compress_size = compress_size + len(buf) 478 self.fp.write(buf) 479 zinfo.compress_size = compress_size 480 else: 481 zinfo.compress_size = file_size 482 zinfo.CRC = CRC 483 zinfo.file_size = file_size 484 if not zip64 and self._allowZip64: 485 if file_size > ZIP64_LIMIT: 486 raise RuntimeError(‘File size has increased during compressing‘) 487 if compress_size > ZIP64_LIMIT: 488 raise RuntimeError(‘Compressed size larger than uncompressed size‘) 489 # Seek backwards and write file header (which will now include 490 # correct CRC and file sizes) 491 position = self.fp.tell() # Preserve current position in file 492 self.fp.seek(zinfo.header_offset, 0) 493 self.fp.write(zinfo.FileHeader(zip64)) 494 self.fp.seek(position, 0) 495 self.filelist.append(zinfo) 496 self.NameToInfo[zinfo.filename] = zinfo 497 498 def writestr(self, zinfo_or_arcname, bytes, compress_type=None): 499 """Write a file into the archive. The contents is the string 500 ‘bytes‘. ‘zinfo_or_arcname‘ is either a ZipInfo instance or 501 the name of the file in the archive.""" 502 if not isinstance(zinfo_or_arcname, ZipInfo): 503 zinfo = ZipInfo(filename=zinfo_or_arcname, 504 date_time=time.localtime(time.time())[:6]) 505 506 zinfo.compress_type = self.compression 507 if zinfo.filename[-1] == ‘/‘: 508 zinfo.external_attr = 0o40775 << 16 # drwxrwxr-x 509 zinfo.external_attr |= 0x10 # MS-DOS directory flag 510 else: 511 zinfo.external_attr = 0o600 << 16 # ?rw------- 512 else: 513 zinfo = zinfo_or_arcname 514 515 if not self.fp: 516 raise RuntimeError( 517 "Attempt to write to ZIP archive that was already closed") 518 519 if compress_type is not None: 520 zinfo.compress_type = compress_type 521 522 zinfo.file_size = len(bytes) # Uncompressed size 523 zinfo.header_offset = self.fp.tell() # Start of header bytes 524 self._writecheck(zinfo) 525 self._didModify = True 526 zinfo.CRC = crc32(bytes) & 0xffffffff # CRC-32 checksum 527 if zinfo.compress_type == ZIP_DEFLATED: 528 co = zlib.compressobj(zlib.Z_DEFAULT_COMPRESSION, 529 zlib.DEFLATED, -15) 530 bytes = co.compress(bytes) + co.flush() 531 zinfo.compress_size = len(bytes) # Compressed size 532 else: 533 zinfo.compress_size = zinfo.file_size 534 zip64 = zinfo.file_size > ZIP64_LIMIT or 535 zinfo.compress_size > ZIP64_LIMIT 536 if zip64 and not self._allowZip64: 537 raise LargeZipFile("Filesize would require ZIP64 extensions") 538 self.fp.write(zinfo.FileHeader(zip64)) 539 self.fp.write(bytes) 540 if zinfo.flag_bits & 0x08: 541 # Write CRC and file sizes after the file data 542 fmt = ‘<LQQ‘ if zip64 else ‘<LLL‘ 543 self.fp.write(struct.pack(fmt, zinfo.CRC, zinfo.compress_size, 544 zinfo.file_size)) 545 self.fp.flush() 546 self.filelist.append(zinfo) 547 self.NameToInfo[zinfo.filename] = zinfo 548 549 def __del__(self): 550 """Call the "close()" method in case the user forgot.""" 551 self.close() 552 553 def close(self): 554 """Close the file, and for mode "w" and "a" write the ending 555 records.""" 556 if self.fp is None: 557 return 558 559 try: 560 if self.mode in ("w", "a") and self._didModify: # write ending records 561 pos1 = self.fp.tell() 562 for zinfo in self.filelist: # write central directory 563 dt = zinfo.date_time 564 dosdate = (dt[0] - 1980) << 9 | dt[1] << 5 | dt[2] 565 dostime = dt[3] << 11 | dt[4] << 5 | (dt[5] // 2) 566 extra = [] 567 if zinfo.file_size > ZIP64_LIMIT 568 or zinfo.compress_size > ZIP64_LIMIT: 569 extra.append(zinfo.file_size) 570 extra.append(zinfo.compress_size) 571 file_size = 0xffffffff 572 compress_size = 0xffffffff 573 else: 574 file_size = zinfo.file_size 575 compress_size = zinfo.compress_size 576 577 if zinfo.header_offset > ZIP64_LIMIT: 578 extra.append(zinfo.header_offset) 579 header_offset = 0xffffffffL 580 else: 581 header_offset = zinfo.header_offset 582 583 extra_data = zinfo.extra 584 if extra: 585 # Append a ZIP64 field to the extra‘s 586 extra_data = struct.pack( 587 ‘<HH‘ + ‘Q‘*len(extra), 588 1, 8*len(extra), *extra) + extra_data 589 590 extract_version = max(45, zinfo.extract_version) 591 create_version = max(45, zinfo.create_version) 592 else: 593 extract_version = zinfo.extract_version 594 create_version = zinfo.create_version 595 596 try: 597 filename, flag_bits = zinfo._encodeFilenameFlags() 598 centdir = struct.pack(structCentralDir, 599 stringCentralDir, create_version, 600 zinfo.create_system, extract_version, zinfo.reserved, 601 flag_bits, zinfo.compress_type, dostime, dosdate, 602 zinfo.CRC, compress_size, file_size, 603 len(filename), len(extra_data), len(zinfo.comment), 604 0, zinfo.internal_attr, zinfo.external_attr, 605 header_offset) 606 except DeprecationWarning: 607 print >>sys.stderr, (structCentralDir, 608 stringCentralDir, create_version, 609 zinfo.create_system, extract_version, zinfo.reserved, 610 zinfo.flag_bits, zinfo.compress_type, dostime, dosdate, 611 zinfo.CRC, compress_size, file_size, 612 len(zinfo.filename), len(extra_data), len(zinfo.comment), 613 0, zinfo.internal_attr, zinfo.external_attr, 614 header_offset) 615 raise 616 self.fp.write(centdir) 617 self.fp.write(filename) 618 self.fp.write(extra_data) 619 self.fp.write(zinfo.comment) 620 621 pos2 = self.fp.tell() 622 # Write end-of-zip-archive record 623 centDirCount = len(self.filelist) 624 centDirSize = pos2 - pos1 625 centDirOffset = pos1 626 requires_zip64 = None 627 if centDirCount > ZIP_FILECOUNT_LIMIT: 628 requires_zip64 = "Files count" 629 elif centDirOffset > ZIP64_LIMIT: 630 requires_zip64 = "Central directory offset" 631 elif centDirSize > ZIP64_LIMIT: 632 requires_zip64 = "Central directory size" 633 if requires_zip64: 634 # Need to write the ZIP64 end-of-archive records 635 if not self._allowZip64: 636 raise LargeZipFile(requires_zip64 + 637 " would require ZIP64 extensions") 638 zip64endrec = struct.pack( 639 structEndArchive64, stringEndArchive64, 640 44, 45, 45, 0, 0, centDirCount, centDirCount, 641 centDirSize, centDirOffset) 642 self.fp.write(zip64endrec) 643 644 zip64locrec = struct.pack( 645 structEndArchive64Locator, 646 stringEndArchive64Locator, 0, pos2, 1) 647 self.fp.write(zip64locrec) 648 centDirCount = min(centDirCount, 0xFFFF) 649 centDirSize = min(centDirSize, 0xFFFFFFFF) 650 centDirOffset = min(centDirOffset, 0xFFFFFFFF) 651 652 endrec = struct.pack(structEndArchive, stringEndArchive, 653 0, 0, centDirCount, centDirCount, 654 centDirSize, centDirOffset, len(self._comment)) 655 self.fp.write(endrec) 656 self.fp.write(self._comment) 657 self.fp.flush() 658 finally: 659 fp = self.fp 660 self.fp = None 661 if not self._filePassed: 662 fp.close()
④、TarFile
1 class TarFile(object): 2 """The TarFile Class provides an interface to tar archives. 3 """ 4 5 debug = 0 # May be set from 0 (no msgs) to 3 (all msgs) 6 7 dereference = False # If true, add content of linked file to the 8 # tar file, else the link. 9 10 ignore_zeros = False # If true, skips empty or invalid blocks and 11 # continues processing. 12 13 errorlevel = 1 # If 0, fatal errors only appear in debug 14 # messages (if debug >= 0). If > 0, errors 15 # are passed to the caller as exceptions. 16 17 format = DEFAULT_FORMAT # The format to use when creating an archive. 18 19 encoding = ENCODING # Encoding for 8-bit character strings. 20 21 errors = None # Error handler for unicode conversion. 22 23 tarinfo = TarInfo # The default TarInfo class to use. 24 25 fileobject = ExFileObject # The default ExFileObject class to use. 26 27 def __init__(self, name=None, mode="r", fileobj=None, format=None, 28 tarinfo=None, dereference=None, ignore_zeros=None, encoding=None, 29 errors=None, pax_headers=None, debug=None, errorlevel=None): 30 """Open an (uncompressed) tar archive `name‘. `mode‘ is either ‘r‘ to 31 read from an existing archive, ‘a‘ to append data to an existing 32 file or ‘w‘ to create a new file overwriting an existing one. `mode‘ 33 defaults to ‘r‘. 34 If `fileobj‘ is given, it is used for reading or writing data. If it 35 can be determined, `mode‘ is overridden by `fileobj‘s mode. 36 `fileobj‘ is not closed, when TarFile is closed. 37 """ 38 modes = {"r": "rb", "a": "r+b", "w": "wb"} 39 if mode not in modes: 40 raise ValueError("mode must be ‘r‘, ‘a‘ or ‘w‘") 41 self.mode = mode 42 self._mode = modes[mode] 43 44 if not fileobj: 45 if self.mode == "a" and not os.path.exists(name): 46 # Create nonexistent files in append mode. 47 self.mode = "w" 48 self._mode = "wb" 49 fileobj = bltn_open(name, self._mode) 50 self._extfileobj = False 51 else: 52 if name is None and hasattr(fileobj, "name"): 53 name = fileobj.name 54 if hasattr(fileobj, "mode"): 55 self._mode = fileobj.mode 56 self._extfileobj = True 57 self.name = os.path.abspath(name) if name else None 58 self.fileobj = fileobj 59 60 # Init attributes. 61 if format is not None: 62 self.format = format 63 if tarinfo is not None: 64 self.tarinfo = tarinfo 65 if dereference is not None: 66 self.dereference = dereference 67 if ignore_zeros is not None: 68 self.ignore_zeros = ignore_zeros 69 if encoding is not None: 70 self.encoding = encoding 71 72 if errors is not None: 73 self.errors = errors 74 elif mode == "r": 75 self.errors = "utf-8" 76 else: 77 self.errors = "strict" 78 79 if pax_headers is not None and self.format == PAX_FORMAT: 80 self.pax_headers = pax_headers 81 else: 82 self.pax_headers = {} 83 84 if debug is not None: 85 self.debug = debug 86 if errorlevel is not None: 87 self.errorlevel = errorlevel 88 89 # Init datastructures. 90 self.closed = False 91 self.members = [] # list of members as TarInfo objects 92 self._loaded = False # flag if all members have been read 93 self.offset = self.fileobj.tell() 94 # current position in the archive file 95 self.inodes = {} # dictionary caching the inodes of 96 # archive members already added 97 98 try: 99 if self.mode == "r": 100 self.firstmember = None 101 self.firstmember = self.next() 102 103 if self.mode == "a": 104 # Move to the end of the archive, 105 # before the first empty block. 106 while True: 107 self.fileobj.seek(self.offset) 108 try: 109 tarinfo = self.tarinfo.fromtarfile(self) 110 self.members.append(tarinfo) 111 except EOFHeaderError: 112 self.fileobj.seek(self.offset) 113 break 114 except HeaderError, e: 115 raise ReadError(str(e)) 116 117 if self.mode in "aw": 118 self._loaded = True 119 120 if self.pax_headers: 121 buf = self.tarinfo.create_pax_global_header(self.pax_headers.copy()) 122 self.fileobj.write(buf) 123 self.offset += len(buf) 124 except: 125 if not self._extfileobj: 126 self.fileobj.close() 127 self.closed = True 128 raise 129 130 def _getposix(self): 131 return self.format == USTAR_FORMAT 132 def _setposix(self, value): 133 import warnings 134 warnings.warn("use the format attribute instead", DeprecationWarning, 135 2) 136 if value: 137 self.format = USTAR_FORMAT 138 else: 139 self.format = GNU_FORMAT 140 posix = property(_getposix, _setposix) 141 142 #-------------------------------------------------------------------------- 143 # Below are the classmethods which act as alternate constructors to the 144 # TarFile class. The open() method is the only one that is needed for 145 # public use; it is the "super"-constructor and is able to select an 146 # adequate "sub"-constructor for a particular compression using the mapping 147 # from OPEN_METH. 148 # 149 # This concept allows one to subclass TarFile without losing the comfort of 150 # the super-constructor. A sub-constructor is registered and made available 151 # by adding it to the mapping in OPEN_METH. 152 153 @classmethod 154 def open(cls, name=None, mode="r", fileobj=None, bufsize=RECORDSIZE, **kwargs): 155 """Open a tar archive for reading, writing or appending. Return 156 an appropriate TarFile class. 157 158 mode: 159 ‘r‘ or ‘r:*‘ open for reading with transparent compression 160 ‘r:‘ open for reading exclusively uncompressed 161 ‘r:gz‘ open for reading with gzip compression 162 ‘r:bz2‘ open for reading with bzip2 compression 163 ‘a‘ or ‘a:‘ open for appending, creating the file if necessary 164 ‘w‘ or ‘w:‘ open for writing without compression 165 ‘w:gz‘ open for writing with gzip compression 166 ‘w:bz2‘ open for writing with bzip2 compression 167 168 ‘r|*‘ open a stream of tar blocks with transparent compression 169 ‘r|‘ open an uncompressed stream of tar blocks for reading 170 ‘r|gz‘ open a gzip compressed stream of tar blocks 171 ‘r|bz2‘ open a bzip2 compressed stream of tar blocks 172 ‘w|‘ open an uncompressed stream for writing 173 ‘w|gz‘ open a gzip compressed stream for writing 174 ‘w|bz2‘ open a bzip2 compressed stream for writing 175 """ 176 177 if not name and not fileobj: 178 raise ValueError("nothing to open") 179 180 if mode in ("r", "r:*"): 181 # Find out which *open() is appropriate for opening the file. 182 for comptype in cls.OPEN_METH: 183 func = getattr(cls, cls.OPEN_METH[comptype]) 184 if fileobj is not None: 185 saved_pos = fileobj.tell() 186 try: 187 return func(name, "r", fileobj, **kwargs) 188 except (ReadError, CompressionError), e: 189 if fileobj is not None: 190 fileobj.seek(saved_pos) 191 continue 192 raise ReadError("file could not be opened successfully") 193 194 elif ":" in mode: 195 filemode, comptype = mode.split(":", 1) 196 filemode = filemode or "r" 197 comptype = comptype or "tar" 198 199 # Select the *open() function according to 200 # given compression. 201 if comptype in cls.OPEN_METH: 202 func = getattr(cls, cls.OPEN_METH[comptype]) 203 else: 204 raise CompressionError("unknown compression type %r" % comptype) 205 return func(name, filemode, fileobj, **kwargs) 206 207 elif "|" in mode: 208 filemode, comptype = mode.split("|", 1) 209 filemode = filemode or "r" 210 comptype = comptype or "tar" 211 212 if filemode not in ("r", "w"): 213 raise ValueError("mode must be ‘r‘ or ‘w‘") 214 215 stream = _Stream(name, filemode, comptype, fileobj, bufsize) 216 try: 217 t = cls(name, filemode, stream, **kwargs) 218 except: 219 stream.close() 220 raise 221 t._extfileobj = False 222 return t 223 224 elif mode in ("a", "w"): 225 return cls.taropen(name, mode, fileobj, **kwargs) 226 227 raise ValueError("undiscernible mode") 228 229 @classmethod 230 def taropen(cls, name, mode="r", fileobj=None, **kwargs): 231 """Open uncompressed tar archive name for reading or writing. 232 """ 233 if mode not in ("r", "a", "w"): 234 raise ValueError("mode must be ‘r‘, ‘a‘ or ‘w‘") 235 return cls(name, mode, fileobj, **kwargs) 236 237 @classmethod 238 def gzopen(cls, name, mode="r", fileobj=None, compresslevel=9, **kwargs): 239 """Open gzip compressed tar archive name for reading or writing. 240 Appending is not allowed. 241 """ 242 if mode not in ("r", "w"): 243 raise ValueError("mode must be ‘r‘ or ‘w‘") 244 245 try: 246 import gzip 247 gzip.GzipFile 248 except (ImportError, AttributeError): 249 raise CompressionError("gzip module is not available") 250 251 try: 252 fileobj = gzip.GzipFile(name, mode, compresslevel, fileobj) 253 except OSError: 254 if fileobj is not None and mode == ‘r‘: 255 raise ReadError("not a gzip file") 256 raise 257 258 try: 259 t = cls.taropen(name, mode, fileobj, **kwargs) 260 except IOError: 261 fileobj.close() 262 if mode == ‘r‘: 263 raise ReadError("not a gzip file") 264 raise 265 except: 266 fileobj.close() 267 raise 268 t._extfileobj = False 269 return t 270 271 @classmethod 272 def bz2open(cls, name, mode="r", fileobj=None, compresslevel=9, **kwargs): 273 """Open bzip2 compressed tar archive name for reading or writing. 274 Appending is not allowed. 275 """ 276 if mode not in ("r", "w"): 277 raise ValueError("mode must be ‘r‘ or ‘w‘.") 278 279 try: 280 import bz2 281 except ImportError: 282 raise CompressionError("bz2 module is not available") 283 284 if fileobj is not None: 285 fileobj = _BZ2Proxy(fileobj, mode) 286 else: 287 fileobj = bz2.BZ2File(name, mode, compresslevel=compresslevel) 288 289 try: 290 t = cls.taropen(name, mode, fileobj, **kwargs) 291 except (IOError, EOFError): 292 fileobj.close() 293 if mode == ‘r‘: 294 raise ReadError("not a bzip2 file") 295 raise 296 except: 297 fileobj.close() 298 raise 299 t._extfileobj = False 300 return t 301 302 # All *open() methods are registered here. 303 OPEN_METH = { 304 "tar": "taropen", # uncompressed tar 305 "gz": "gzopen", # gzip compressed tar 306 "bz2": "bz2open" # bzip2 compressed tar 307 } 308 309 #-------------------------------------------------------------------------- 310 # The public methods which TarFile provides: 311 312 def close(self): 313 """Close the TarFile. In write-mode, two finishing zero blocks are 314 appended to the archive. 315 """ 316 if self.closed: 317 return 318 319 if self.mode in "aw": 320 self.fileobj.write(NUL * (BLOCKSIZE * 2)) 321 self.offset += (BLOCKSIZE * 2) 322 # fill up the end with zero-blocks 323 # (like option -b20 for tar does) 324 blocks, remainder = divmod(self.offset, RECORDSIZE) 325 if remainder > 0: 326 self.fileobj.write(NUL * (RECORDSIZE - remainder)) 327 328 if not self._extfileobj: 329 self.fileobj.close() 330 self.closed = True 331 332 def getmember(self, name): 333 """Return a TarInfo object for member `name‘. If `name‘ can not be 334 found in the archive, KeyError is raised. If a member occurs more 335 than once in the archive, its last occurrence is assumed to be the 336 most up-to-date version. 337 """ 338 tarinfo = self._getmember(name) 339 if tarinfo is None: 340 raise KeyError("filename %r not found" % name) 341 return tarinfo 342 343 def getmembers(self): 344 """Return the members of the archive as a list of TarInfo objects. The 345 list has the same order as the members in the archive. 346 """ 347 self._check() 348 if not self._loaded: # if we want to obtain a list of 349 self._load() # all members, we first have to 350 # scan the whole archive. 351 return self.members 352 353 def getnames(self): 354 """Return the members of the archive as a list of their names. It has 355 the same order as the list returned by getmembers(). 356 """ 357 return [tarinfo.name for tarinfo in self.getmembers()] 358 359 def gettarinfo(self, name=None, arcname=None, fileobj=None): 360 """Create a TarInfo object for either the file `name‘ or the file 361 object `fileobj‘ (using os.fstat on its file descriptor). You can 362 modify some of the TarInfo‘s attributes before you add it using 363 addfile(). If given, `arcname‘ specifies an alternative name for the 364 file in the archive. 365 """ 366 self._check("aw") 367 368 # When fileobj is given, replace name by 369 # fileobj‘s real name. 370 if fileobj is not None: 371 name = fileobj.name 372 373 # Building the name of the member in the archive. 374 # Backward slashes are converted to forward slashes, 375 # Absolute paths are turned to relative paths. 376 if arcname is None: 377 arcname = name 378 drv, arcname = os.path.splitdrive(arcname) 379 arcname = arcname.replace(os.sep, "/") 380 arcname = arcname.lstrip("/") 381 382 # Now, fill the TarInfo object with 383 # information specific for the file. 384 tarinfo = self.tarinfo() 385 tarinfo.tarfile = self 386 387 # Use os.stat or os.lstat, depending on platform 388 # and if symlinks shall be resolved. 389 if fileobj is None: 390 if hasattr(os, "lstat") and not self.dereference: 391 statres = os.lstat(name) 392 else: 393 statres = os.stat(name) 394 else: 395 statres = os.fstat(fileobj.fileno()) 396 linkname = "" 397 398 stmd = statres.st_mode 399 if stat.S_ISREG(stmd): 400 inode = (statres.st_ino, statres.st_dev) 401 if not self.dereference and statres.st_nlink > 1 and 402 inode in self.inodes and arcname != self.inodes[inode]: 403 # Is it a hardlink to an already 404 # archived file? 405 type = LNKTYPE 406 linkname = self.inodes[inode] 407 else: 408 # The inode is added only if its valid. 409 # For win32 it is always 0. 410 type = REGTYPE 411 if inode[0]: 412 self.inodes[inode] = arcname 413 elif stat.S_ISDIR(stmd): 414 type = DIRTYPE 415 elif stat.S_ISFIFO(stmd): 416 type = FIFOTYPE 417 elif stat.S_ISLNK(stmd): 418 type = SYMTYPE 419 linkname = os.readlink(name) 420 elif stat.S_ISCHR(stmd): 421 type = CHRTYPE 422 elif stat.S_ISBLK(stmd): 423 type = BLKTYPE 424 else: 425 return None 426 427 # Fill the TarInfo object with all 428 # information we can get. 429 tarinfo.name = arcname 430 tarinfo.mode = stmd 431 tarinfo.uid = statres.st_uid 432 tarinfo.gid = statres.st_gid 433 if type == REGTYPE: 434 tarinfo.size = statres.st_size 435 else: 436 tarinfo.size = 0L 437 tarinfo.mtime = statres.st_mtime 438 tarinfo.type = type 439 tarinfo.linkname = linkname 440 if pwd: 441 try: 442 tarinfo.uname = pwd.getpwuid(tarinfo.uid)[0] 443 except KeyError: 444 pass 445 if grp: 446 try: 447 tarinfo.gname = grp.getgrgid(tarinfo.gid)[0] 448 except KeyError: 449 pass 450 451 if type in (CHRTYPE, BLKTYPE): 452 if hasattr(os, "major") and hasattr(os, "minor"): 453 tarinfo.devmajor = os.major(statres.st_rdev) 454 tarinfo.devminor = os.minor(statres.st_rdev) 455 return tarinfo 456 457 def list(self, verbose=True): 458 """Print a table of contents to sys.stdout. If `verbose‘ is False, only 459 the names of the members are printed. If it is True, an `ls -l‘-like 460 output is produced. 461 """ 462 self._check() 463 464 for tarinfo in self: 465 if verbose: 466 print filemode(tarinfo.mode), 467 print "%s/%s" % (tarinfo.uname or tarinfo.uid, 468 tarinfo.gname or tarinfo.gid), 469 if tarinfo.ischr() or tarinfo.isblk(): 470 print "%10s" % ("%d,%d" 471 % (tarinfo.devmajor, tarinfo.devminor)), 472 else: 473 print "%10d" % tarinfo.size, 474 print "%d-%02d-%02d %02d:%02d:%02d" 475 % time.localtime(tarinfo.mtime)[:6], 476 477 print tarinfo.name + ("/" if tarinfo.isdir() else ""), 478 479 if verbose: 480 if tarinfo.issym(): 481 print "->", tarinfo.linkname, 482 if tarinfo.islnk(): 483 print "link to", tarinfo.linkname, 484 print 485 486 def add(self, name, arcname=None, recursive=True, exclude=None, filter=None): 487 """Add the file `name‘ to the archive. `name‘ may be any type of file 488 (directory, fifo, symbolic link, etc.). If given, `arcname‘ 489 specifies an alternative name for the file in the archive. 490 Directories are added recursively by default. This can be avoided by 491 setting `recursive‘ to False. `exclude‘ is a function that should 492 return True for each filename to be excluded. `filter‘ is a function 493 that expects a TarInfo object argument and returns the changed 494 TarInfo object, if it returns None the TarInfo object will be 495 excluded from the archive. 496 """ 497 self._check("aw") 498 499 if arcname is None: 500 arcname = name 501 502 # Exclude pathnames. 503 if exclude is not None: 504 import warnings 505 warnings.warn("use the filter argument instead", 506 DeprecationWarning, 2) 507 if exclude(name): 508 self._dbg(2, "tarfile: Excluded %r" % name) 509 return 510 511 # Skip if somebody tries to archive the archive... 512 if self.name is not None and os.path.abspath(name) == self.name: 513 self._dbg(2, "tarfile: Skipped %r" % name) 514 return 515 516 self._dbg(1, name) 517 518 # Create a TarInfo object from the file. 519 tarinfo = self.gettarinfo(name, arcname) 520 521 if tarinfo is None: 522 self._dbg(1, "tarfile: Unsupported type %r" % name) 523 return 524 525 # Change or exclude the TarInfo object. 526 if filter is not None: 527 tarinfo = filter(tarinfo) 528 if tarinfo is None: 529 self._dbg(2, "tarfile: Excluded %r" % name) 530 return 531 532 # Append the tar header and data to the archive. 533 if tarinfo.isreg(): 534 with bltn_open(name, "rb") as f: 535 self.addfile(tarinfo, f) 536 537 elif tarinfo.isdir(): 538 self.addfile(tarinfo) 539 if recursive: 540 for f in os.listdir(name): 541 self.add(os.path.join(name, f), os.path.join(arcname, f), 542 recursive, exclude, filter) 543 544 else: 545 self.addfile(tarinfo) 546 547 def addfile(self, tarinfo, fileobj=None): 548 """Add the TarInfo object `tarinfo‘ to the archive. If `fileobj‘ is 549 given, tarinfo.size bytes are read from it and added to the archive. 550 You can create TarInfo objects using gettarinfo(). 551 On Windows platforms, `fileobj‘ should always be opened with mode 552 ‘rb‘ to avoid irritation about the file size. 553 """ 554 self._check("aw") 555 556 tarinfo = copy.copy(tarinfo) 557 558 buf = tarinfo.tobuf(self.format, self.encoding, self.errors) 559 self.fileobj.write(buf) 560 self.offset += len(buf) 561 562 # If there‘s data to follow, append it. 563 if fileobj is not None: 564 copyfileobj(fileobj, self.fileobj, tarinfo.size) 565 blocks, remainder = divmod(tarinfo.size, BLOCKSIZE) 566 if remainder > 0: 567 self.fileobj.write(NUL * (BLOCKSIZE - remainder)) 568 blocks += 1 569 self.offset += blocks * BLOCKSIZE 570 571 self.members.append(tarinfo) 572 573 def extractall(self, path=".", members=None): 574 """Extract all members from the archive to the current working 575 directory and set owner, modification time and permissions on 576 directories afterwards. `path‘ specifies a different directory 577 to extract to. `members‘ is optional and must be a subset of the 578 list returned by getmembers(). 579 """ 580 directories = [] 581 582 if members is None: 583 members = self 584 585 for tarinfo in members: 586 if tarinfo.isdir(): 587 # Extract directories with a safe mode. 588 directories.append(tarinfo) 589 tarinfo = copy.copy(tarinfo) 590 tarinfo.mode = 0700 591 self.extract(tarinfo, path) 592 593 # Reverse sort directories. 594 directories.sort(key=operator.attrgetter(‘name‘)) 595 directories.reverse() 596 597 # Set correct owner, mtime and filemode on directories. 598 for tarinfo in directories: 599 dirpath = os.path.join(path, tarinfo.name) 600 try: 601 self.chown(tarinfo, dirpath) 602 self.utime(tarinfo, dirpath) 603 self.chmod(tarinfo, dirpath) 604 except ExtractError, e: 605 if self.errorlevel > 1: 606 raise 607 else: 608 self._dbg(1, "tarfile: %s" % e) 609 610 def extract(self, member, path=""): 611 """Extract a member from the archive to the current working directory, 612 using its full name. Its file information is extracted as accurately 613 as possible. `member‘ may be a filename or a TarInfo object. You can 614 specify a different directory using `path‘. 615 """ 616 self._check("r") 617 618 if isinstance(member, basestring): 619 tarinfo = self.getmember(member) 620 else: 621 tarinfo = member 622 623 # Prepare the link target for makelink(). 624 if tarinfo.islnk(): 625 tarinfo._link_target = os.path.join(path, tarinfo.linkname) 626 627 try: 628 self._extract_member(tarinfo, os.path.join(path, tarinfo.name)) 629 except EnvironmentError, e: 630 if self.errorlevel > 0: 631 raise 632 else: 633 if e.filename is None: 634 self._dbg(1, "tarfile: %s" % e.strerror) 635 else: 636 self._dbg(1, "tarfile: %s %r" % (e.strerror, e.filename)) 637 except ExtractError, e: 638 if self.errorlevel > 1: 639 raise 640 else: 641 self._dbg(1, "tarfile: %s" % e) 642 643 def extractfile(self, member): 644 """Extract a member from the archive as a file object. `member‘ may be 645 a filename or a TarInfo object. If `member‘ is a regular file, a 646 file-like object is returned. If `member‘ is a link, a file-like 647 object is constructed from the link‘s target. If `member‘ is none of 648 the above, None is returned. 649 The file-like object is read-only and provides the following 650 methods: read(), readline(), readlines(), seek() and tell() 651 """ 652 self._check("r") 653 654 if isinstance(member, basestring): 655 tarinfo = self.getmember(member) 656 else: 657 tarinfo = member 658 659 if tarinfo.isreg(): 660 return self.fileobject(self, tarinfo) 661 662 elif tarinfo.type not in SUPPORTED_TYPES: 663 # If a member‘s type is unknown, it is treated as a 664 # regular file. 665 return self.fileobject(self, tarinfo) 666 667 elif tarinfo.islnk() or tarinfo.issym(): 668 if isinstance(self.fileobj, _Stream): 669 # A small but ugly workaround for the case that someone tries 670 # to extract a (sym)link as a file-object from a non-seekable 671 # stream of tar blocks. 672 raise StreamError("cannot extract (sym)link as file object") 673 else: 674 # A (sym)link‘s file object is its target‘s file object. 675 return self.extractfile(self._find_link_target(tarinfo)) 676 else: 677 # If there‘s no data associated with the member (directory, chrdev, 678 # blkdev, etc.), return None instead of a file object. 679 return None 680 681 def _extract_member(self, tarinfo, targetpath): 682 """Extract the TarInfo object tarinfo to a physical 683 file called targetpath. 684 """ 685 # Fetch the TarInfo object for the given name 686 # and build the destination pathname, replacing 687 # forward slashes to platform specific separators. 688 targetpath = targetpath.rstrip("/") 689 targetpath = targetpath.replace("/", os.sep) 690 691 # Create all upper directories. 692 upperdirs = os.path.dirname(targetpath) 693 if upperdirs and not os.path.exists(upperdirs): 694 # Create directories that are not part of the archive with 695 # default permissions. 696 os.makedirs(upperdirs) 697 698 if tarinfo.islnk() or tarinfo.issym(): 699 self._dbg(1, "%s -> %s" % (tarinfo.name, tarinfo.linkname)) 700 else: 701 self._dbg(1, tarinfo.name) 702 703 if tarinfo.isreg(): 704 self.makefile(tarinfo, targetpath) 705 elif tarinfo.isdir(): 706 self.makedir(tarinfo, targetpath) 707 elif tarinfo.isfifo(): 708 self.makefifo(tarinfo, targetpath) 709 elif tarinfo.ischr() or tarinfo.isblk(): 710 self.makedev(tarinfo, targetpath) 711 elif tarinfo.islnk() or tarinfo.issym(): 712 self.makelink(tarinfo, targetpath) 713 elif tarinfo.type not in SUPPORTED_TYPES: 714 self.makeunknown(tarinfo, targetpath) 715 else: 716 self.makefile(tarinfo, targetpath) 717 718 self.chown(tarinfo, targetpath) 719 if not tarinfo.issym(): 720 self.chmod(tarinfo, targetpath) 721 self.utime(tarinfo, targetpath) 722 723 #-------------------------------------------------------------------------- 724 # Below are the different file methods. They are called via 725 # _extract_member() when extract() is called. They can be replaced in a 726 # subclass to implement other functionality. 727 728 def makedir(self, tarinfo, targetpath): 729 """Make a directory called targetpath. 730 """ 731 try: 732 # Use a safe mode for the directory, the real mode is set 733 # later in _extract_member(). 734 os.mkdir(targetpath, 0700) 735 except EnvironmentError, e: 736 if e.errno != errno.EEXIST: 737 raise 738 739 def makefile(self, tarinfo, targetpath): 740 """Make a file called targetpath. 741 """ 742 source = self.extractfile(tarinfo) 743 try: 744 with bltn_open(targetpath, "wb") as target: 745 copyfileobj(source, target) 746 finally: 747 source.close() 748 749 def makeunknown(self, tarinfo, targetpath): 750 """Make a file from a TarInfo object with an unknown type 751 at targetpath. 752 """ 753 self.makefile(tarinfo, targetpath) 754 self._dbg(1, "tarfile: Unknown file type %r, " 755 "extracted as regular file." % tarinfo.type) 756 757 def makefifo(self, tarinfo, targetpath): 758 """Make a fifo called targetpath. 759 """ 760 if hasattr(os, "mkfifo"): 761 os.mkfifo(targetpath) 762 else: 763 raise ExtractError("fifo not supported by system") 764 765 def makedev(self, tarinfo, targetpath): 766 """Make a character or block device called targetpath. 767 """ 768 if not hasattr(os, "mknod") or not hasattr(os, "makedev"): 769 raise ExtractError("special devices not supported by system") 770 771 mode = tarinfo.mode 772 if tarinfo.isblk(): 773 mode |= stat.S_IFBLK 774 else: 775 mode |= stat.S_IFCHR 776 777 os.mknod(targetpath, mode, 778 os.makedev(tarinfo.devmajor, tarinfo.devminor)) 779 780 def makelink(self, tarinfo, targetpath): 781 """Make a (symbolic) link called targetpath. If it cannot be created 782 (platform limitation), we try to make a copy of the referenced file 783 instead of a link. 784 """ 785 if hasattr(os, "symlink") and hasattr(os, "link"): 786 # For systems that support symbolic and hard links. 787 if tarinfo.issym(): 788 if os.path.lexists(targetpath): 789 os.unlink(targetpath) 790 os.symlink(tarinfo.linkname, targetpath) 791 else: 792 # See extract(). 793 if os.path.exists(tarinfo._link_target): 794 if os.path.lexists(targetpath): 795 os.unlink(targetpath) 796 os.link(tarinfo._link_target, targetpath) 797 else: 798 self._extract_member(self._find_link_target(tarinfo), targetpath) 799 else: 800 try: 801 self._extract_member(self._find_link_target(tarinfo), targetpath) 802 except KeyError: 803 raise ExtractError("unable to resolve link inside archive") 804 805 def chown(self, tarinfo, targetpath): 806 """Set owner of targetpath according to tarinfo. 807 """ 808 if pwd and hasattr(os, "geteuid") and os.geteuid() == 0: 809 # We have to be root to do so. 810 try: 811 g = grp.getgrnam(tarinfo.gname)[2] 812 except KeyError: 813 g = tarinfo.gid 814 try: 815 u = pwd.getpwnam(tarinfo.uname)[2] 816 except KeyError: 817 u = tarinfo.uid 818 try: 819 if tarinfo.issym() and hasattr(os, "lchown"): 820 os.lchown(targetpath, u, g) 821 else: 822 if sys.platform != "os2emx": 823 os.chown(targetpath, u, g) 824 except EnvironmentError, e: 825 raise ExtractError("could not change owner") 826 827 def chmod(self, tarinfo, targetpath): 828 """Set file permissions of targetpath according to tarinfo. 829 """ 830 if hasattr(os, ‘chmod‘): 831 try: 832 os.chmod(targetpath, tarinfo.mode) 833 except EnvironmentError, e: 834 raise ExtractError("could not change mode") 835 836 def utime(self, tarinfo, targetpath): 837 """Set modification time of targetpath according to tarinfo. 838 """ 839 if not hasattr(os, ‘utime‘): 840 return 841 try: 842 os.utime(targetpath, (tarinfo.mtime, tarinfo.mtime)) 843 except EnvironmentError, e: 844 raise ExtractError("could not change modification time") 845 846 #-------------------------------------------------------------------------- 847 def next(self): 848 """Return the next member of the archive as a TarInfo object, when 849 TarFile is opened for reading. Return None if there is no more 850 available. 851 """ 852 self._check("ra") 853 if self.firstmember is not None: 854 m = self.firstmember 855 self.firstmember = None 856 return m 857 858 # Read the next block. 859 self.fileobj.seek(self.offset) 860 tarinfo = None 861 while True: 862 try: 863 tarinfo = self.tarinfo.fromtarfile(self) 864 except EOFHeaderError, e: 865 if self.ignore_zeros: 866 self._dbg(2, "0x%X: %s" % (self.offset, e)) 867 self.offset += BLOCKSIZE 868 continue 869 except InvalidHeaderError, e: 870 if self.ignore_zeros: 871 self._dbg(2, "0x%X: %s" % (self.offset, e)) 872 self.offset += BLOCKSIZE 873 continue 874 elif self.offset == 0: 875 raise ReadError(str(e)) 876 except EmptyHeaderError: 877 if self.offset == 0: 878 raise ReadError("empty file") 879 except TruncatedHeaderError, e: 880 if self.offset == 0: 881 raise ReadError(str(e)) 882 except SubsequentHeaderError, e: 883 raise ReadError(str(e)) 884 break 885 886 if tarinfo is not None: 887 self.members.append(tarinfo) 888 else: 889 self._loaded = True 890 891 return tarinfo 892 893 #-------------------------------------------------------------------------- 894 # Little helper methods: 895 896 def _getmember(self, name, tarinfo=None, normalize=False): 897 """Find an archive member by name from bottom to top. 898 If tarinfo is given, it is used as the starting point. 899 """ 900 # Ensure that all members have been loaded. 901 members = self.getmembers() 902 903 # Limit the member search list up to tarinfo. 904 if tarinfo is not None: 905 members = members[:members.index(tarinfo)] 906 907 if normalize: 908 name = os.path.normpath(name) 909 910 for member in reversed(members): 911 if normalize: 912 member_name = os.path.normpath(member.name) 913 else: 914 member_name = member.name 915 916 if name == member_name: 917 return member 918 919 def _load(self): 920 """Read through the entire archive file and look for readable 921 members. 922 """ 923 while True: 924 tarinfo = self.next() 925 if tarinfo is None: 926 break 927 self._loaded = True 928 929 def _check(self, mode=None): 930 """Check if TarFile is still open, and if the operation‘s mode 931 corresponds to TarFile‘s mode. 932 """ 933 if self.closed: 934 raise IOError("%s is closed" % self.__class__.__name__) 935 if mode is not None and self.mode not in mode: 936 raise IOError("bad operation for mode %r" % self.mode) 937 938 def _find_link_target(self, tarinfo): 939 """Find the target member of a symlink or hardlink member in the 940 archive. 941 """ 942 if tarinfo.issym(): 943 # Always search the entire archive. 944 linkname = "/".join(filter(None, (os.path.dirname(tarinfo.name), tarinfo.linkname))) 945 limit = None 946 else: 947 # Search the archive before the link, because a hard link is 948 # just a reference to an already archived file. 949 linkname = tarinfo.linkname 950 limit = tarinfo 951 952 member = self._getmember(linkname, tarinfo=limit, normalize=True) 953 if member is None: 954 raise KeyError("linkname %r not found" % linkname) 955 return member 956 957 def __iter__(self): 958 """Provide an iterator object. 959 """ 960 if self._loaded: 961 return iter(self.members) 962 else: 963 return TarIter(self) 964 965 def _dbg(self, level, msg): 966 """Write debugging output to sys.stderr. 967 """ 968 if level <= self.debug: 969 print >> sys.stderr, msg 970 971 def __enter__(self): 972 self._check() 973 return self 974 975 def __exit__(self, type, value, traceback): 976 if type is None: 977 self.close() 978 else: 979 # An exception occurred. We must not call close() because 980 # it would try to write end-of-archive blocks and padding. 981 if not self._extfileobj: 982 self.fileobj.close() 983 self.closed = True 984 # class TarFile
f、shelve
shelve模块是一个简单的k,v将内存数据通过文件持久化的模块,可以持久化任何pickle可支持的python数据格式
1 import shelve 2 d = shelve.open(‘shelve_test‘) #打开一个文件 3 class Test(object): 4 def __init__(self,n): 5 self.n = n 6 t = Test(123) 7 t2 = Test(123334) 8 9 name = ["alex","rain","test"] 10 d["test"] = name #持久化列表 11 d["t1"] = t #持久化类 12 d["t2"] = t2 13 d.close()
g、xml处理模块
xml是实现不同语言或程序之间进行数据交换的协议,跟json差不多,但json使用起来更简单,不过,古时候,在json还没诞生的黑暗年代,大家只能选择用xml呀,至今很多传统公司如金融行业的很多系统的接口还主要是xml。
xml的格式如下,就是通过<>节点来区别数据结构的:
1 <?xml version="1.0"?> 2 <data> 3 <country name="Liechtenstein"> 4 <rank updated="yes">2</rank> 5 <year>2008</year> 6 <gdppc>141100</gdppc> 7 <neighbor name="Austria" direction="E"/> 8 <neighbor name="Switzerland" direction="W"/> 9 </country> 10 <country name="Singapore"> 11 <rank updated="yes">5</rank> 12 <year>2011</year> 13 <gdppc>59900</gdppc> 14 <neighbor name="Malaysia" direction="N"/> 15 </country> 16 <country name="Panama"> 17 <rank updated="yes">69</rank> 18 <year>2011</year> 19 <gdppc>13600</gdppc> 20 <neighbor name="Costa Rica" direction="W"/> 21 <neighbor name="Colombia" direction="E"/> 22 </country> 23 </data>
xml协议在各个语言里的都 是支持的,在python中可以用以下模块操作xml
1 import xml.etree.ElementTree as ET 2 tree = ET.parse("xmltest.xml") 3 root = tree.getroot() 4 print(root.tag) 5 #遍历xml文档 6 for child in root: 7 print(child.tag, child.attrib) 8 for i in child: 9 print(i.tag,i.text) 10 #只遍历year 节点 11 for node in root.iter(‘year‘): 12 print(node.tag,node.text)
修改和删除xml文档内容
1 import xml.etree.ElementTree as ET 2 tree = ET.parse("xmltest.xml") 3 root = tree.getroot() 4 5 #修改 6 for node in root.iter(‘year‘): 7 new_year = int(node.text) + 1 8 node.text = str(new_year) 9 node.set("updated","yes") 10 tree.write("xmltest.xml") 11 12 #删除node 13 for country in root.findall(‘country‘): 14 rank = int(country.find(‘rank‘).text) 15 if rank > 50: 16 root.remove(country) 17 tree.write(‘output.xml‘)
自己创建xml文档
1 import xml.etree.ElementTree as ET 2 new_xml = ET.Element("namelist") 3 name = ET.SubElement(new_xml,"name",attrib={"enrolled":"yes"}) 4 age = ET.SubElement(name,"age",attrib={"checked":"no"}) 5 sex = ET.SubElement(name,"sex") 6 sex.text = ‘33‘ 7 name2 = ET.SubElement(new_xml,"name",attrib={"enrolled":"no"}) 8 age = ET.SubElement(name2,"age") 9 age.text = ‘19‘ 10 et = ET.ElementTree(new_xml) #生成文档对象 11 et.write("test.xml", encoding="utf-8",xml_declaration=True) 12 ET.dump(new_xml) #打印生成的格式
h、ConfigParser模块
用于生成和修改常见配置文档,当前模块的名称在 python 3.x 版本中变更为 configparser。
来看一个好多软件的常见文档格式如下
1 [DEFAULT] 2 ServerAliveInterval = 45 3 Compression = yes 4 CompressionLevel = 9 5 ForwardX11 = yes 6 [bitbucket.org] 7 User = hg 8 [topsecret.server.com] 9 Port = 50022 10 ForwardX11 = no
如果想用python生成一个这样的文档怎么做呢?
1 import configparser 2 config = configparser.ConfigParser() 3 config["DEFAULT"] = {‘ServerAliveInterval‘: ‘45‘, 4 ‘Compression‘: ‘yes‘, 5 ‘CompressionLevel‘: ‘9‘} 6 config[‘bitbucket.org‘] = {} 7 config[‘bitbucket.org‘][‘User‘] = ‘hg‘ 8 config[‘topsecret.server.com‘] = {} 9 topsecret = config[‘topsecret.server.com‘] 10 topsecret[‘Host Port‘] = ‘50022‘ # mutates the parser 11 topsecret[‘ForwardX11‘] = ‘no‘ # same here 12 config[‘DEFAULT‘][‘ForwardX11‘] = ‘yes‘ 13 with open(‘example.ini‘, ‘w‘) as configfile: 14 config.write(configfile)
写完了还可以再读出来
1 >>> import configparser 2 >>> config = configparser.ConfigParser() 3 >>> config.sections() 4 [] 5 >>> config.read(‘example.ini‘) 6 [‘example.ini‘] 7 >>> config.sections() 8 [‘bitbucket.org‘, ‘topsecret.server.com‘] 9 >>> ‘bitbucket.org‘ in config 10 True 11 >>> ‘bytebong.com‘ in config 12 False 13 >>> config[‘bitbucket.org‘][‘User‘] 14 ‘hg‘ 15 >>> config[‘DEFAULT‘][‘Compression‘] 16 ‘yes‘ 17 >>> topsecret = config[‘topsecret.server.com‘] 18 >>> topsecret[‘ForwardX11‘] 19 ‘no‘ 20 >>> topsecret[‘Port‘] 21 ‘50022‘ 22 >>> for key in config[‘bitbucket.org‘]: print(key) 23 ... 24 user 25 compressionlevel 26 serveraliveinterval 27 compression 28 forwardx11 29 >>> config[‘bitbucket.org‘][‘ForwardX11‘] 30 ‘yes‘
I、hashlib模块
用于加密相关的操作,3.x里代替了md5模块和sha模块,主要提供 SHA1, SHA224, SHA256, SHA384, SHA512 ,MD5 算法
1 import hashlib 2 m = hashlib.md5() 3 m.update(b"Hello") 4 m.update(b"It‘s me") 5 print(m.digest()) 6 m.update(b"It‘s been a long time since last time we ...") 7 8 print(m.digest()) #2进制格式hash 9 print(len(m.hexdigest())) #16进制格式hash 10 ‘‘‘ 11 def digest(self, *args, **kwargs): # real signature unknown 12 """ Return the digest value as a string of binary data. """ 13 pass 14 15 def hexdigest(self, *args, **kwargs): # real signature unknown 16 """ Return the digest value as a string of hexadecimal digits. """ 17 pass 18 19 ‘‘‘ 20 import hashlib 21 22 # ######## md5 ######## 23 24 hash = hashlib.md5() 25 hash.update(‘admin‘) 26 print(hash.hexdigest()) 27 28 # ######## sha1 ######## 29 30 hash = hashlib.sha1() 31 hash.update(‘admin‘) 32 print(hash.hexdigest()) 33 34 # ######## sha256 ######## 35 36 hash = hashlib.sha256() 37 hash.update(‘admin‘) 38 print(hash.hexdigest()) 39 40 41 # ######## sha384 ######## 42 43 hash = hashlib.sha384() 44 hash.update(‘admin‘) 45 print(hash.hexdigest()) 46 47 # ######## sha512 ######## 48 49 hash = hashlib.sha512() 50 hash.update(‘admin‘) 51 print(hash.hexdigest())
python 还有一个 hmac 模块,它内部对我们创建 key 和 内容 再进行处理然后再加密
1 import hmac 2 h = hmac.new(‘wueiqi‘) 3 h.update(‘hellowo‘) 4 print h.hexdigest()
j、re模块
1 ‘.‘ 默认匹配除\n之外的任意一个字符,若指定flag DOTALL,则匹配任意字符,包括换行 2 ‘^‘ 匹配字符开头,若指定flags MULTILINE,这种也可以匹配上(r"^a","\nabc\neee",flags=re.MULTILINE) 3 ‘$‘ 匹配字符结尾,或e.search("foo$","bfoo\nsdfsf",flags=re.MULTILINE).group()也可以 4 ‘*‘ 匹配*号前的字符0次或多次,re.findall("ab*","cabb3abcbbac") 结果为[‘abb‘, ‘ab‘, ‘a‘] 5 ‘+‘ 匹配前一个字符1次或多次,re.findall("ab+","ab+cd+abb+bba") 结果[‘ab‘, ‘abb‘] 6 ‘?‘ 匹配前一个字符1次或0次 7 ‘{m}‘ 匹配前一个字符m次 8 ‘{n,m}‘ 匹配前一个字符n到m次,re.findall("ab{1,3}","abb abc abbcbbb") 结果‘abb‘, ‘ab‘, ‘abb‘] 9 ‘|‘ 匹配|左或|右的字符,re.search("abc|ABC","ABCBabcCD").group() 结果‘ABC‘ 10 ‘(...)‘ 分组匹配,re.search("(abc){2}a(123|456)c", "abcabca456c").group() 结果 abcabca456c 11 12 13 ‘\A‘ 只从字符开头匹配,re.search("\Aabc","alexabc") 是匹配不到的 14 ‘\Z‘ 匹配字符结尾,同$ 15 ‘\d‘ 匹配数字0-9 16 ‘\D‘ 匹配非数字 17 ‘\w‘ 匹配[A-Za-z0-9] 18 ‘\W‘ 匹配非[A-Za-z0-9] 19 ‘s‘ 匹配空白字符、\t、\n、\r , re.search("\s+","ab\tc1\n3").group() 结果 ‘\t‘ 20 21 ‘(?P<name>...)‘ 分组匹配 re.search("(?P<province>[0-9]{4})(?P<city>[0-9]{2})(?P<birthday>[0-9]{4})","371481199306143242").groupdict("city") 结果{‘province‘: ‘3714‘, ‘city‘: ‘81‘, ‘birthday‘: ‘1993‘}
最常用的匹配语法
1 re.match 从头开始匹配 2 re.search 匹配包含 3 re.findall 把所有匹配到的字符放到以列表中的元素返回 4 re.splitall 以匹配到的字符当做列表分隔符 5 re.sub 匹配字符并替换
反斜杠的困扰
与大多数编程语言相同,正则表达式里使用"\"作为转义字符,这就可能造成反斜杠困扰。假如你需要匹配文本中的字符"\",那么使用
编程语言表示的正则表达式里将需要4个反斜杠"\\\\":前两个和后两个分别用于在编程语言里转义成反斜杠,转换成两个反斜杠后再在正则表达式里转义成
一个反斜杠。Python里的原生字符串很好地解决了这个问题,这个例子中的正则表达式可以使用r"\\"表示。同样,匹配一个数字的"\\d"可以写成
r"\d"。有了原生字符串,你再也不用担心是不是漏写了反斜杠,写出来的表达式也更直观。
仅需轻轻知道的几个匹配模式
1 re.I(re.IGNORECASE): 忽略大小写(括号内是完整写法,下同) 2 M(MULTILINE): 多行模式,改变‘^‘和‘$‘的行为(参见上图) 3 S(DOTALL): 点任意匹配模式,改变‘.‘的行为