【python之旅】python的模块 / 憋错料

一、定义模块：
　　模块：用来从逻辑上组织python代码（变量、函数、类、逻辑：实现一个功能），本质就是以.py结尾的python文件（文件名：test.py ，对应的模块名就是test）
　　包：用来从逻辑上组织模块的，本质就是一个目录（必须带有__init__.py的文件）
二、导入方法：
　　1、import module_guyun

 1 #命名为module_guyun.py
 2 #需要导入的模块内容
 3 #!/usr/bin/env python
 4 # -*- coding: utf-8 -*-
 5 # Author :GU
 6 name = "guyun"
 7 def say_hallo():
 8     print("hello guyun")
 9 ########################
10 #导入模块
11 #!/usr/bin/env python
12 # -*- coding: utf-8 -*-
13 # Author :GU
14 import module_guyun
15 print(module_guyun.name)
16 module_guyun.say_hallo()
17 执行结果：
18 guyun
19 hello guyun

　　2、from module_alex import logger as logger_guyun #别名

　　当要导入的模块与本模块命名重复时，别名要导入的模块可以解决这个问题

 1 #!/usr/bin/env python
 2 # -*- coding: utf-8 -*-
 3 # Author :GU
 4 name = "guyun"
 5 def say_hallo():
 6     print("hello guyun")
 7 def logger():
 8     print("in the module_guyun")
 9 def running():
10     pass
11 ##################
12 #!/usr/bin/env python
13 # -*- coding: utf-8 -*-
14 # Author :GU
15 from module_guyun import logger as logger_guyun
16 def logger():
17     print("in the main")
18 logger()
19 logger_guyun()
20 ##执行结果：
21 in the main
22 in the module_guyun

　　3、导入一个包实际的本质就是导入一个__init__.py

　　包package_test里面的init文件

1 #!/usr/bin/env python
2 # -*- coding: utf-8 -*-
3 # Author :GU
4 print("from test package package_test")

　　现在把package_testp_test文件导入p_test

1 #!/usr/bin/env python
2 # -*- coding: utf-8 -*-
3 # Author :GU
4 import package_test
5 ##执行结果：
6 from test package package_test

　　4、当文件目录不再同一级目录之后该如何调用

　　-module_test

　　　　-main.py

　　-module_guyun.py

　　现在main.py去调用module_guyun.py

 1 #module_guyun.py文件
 2 #!/usr/bin/env python
 3 # -*- coding: utf-8 -*-
 4 # Author :GU
 5 name = "guyun"
 6 def say_hallo():
 7     print("hello guyun")
 8 def logger():
 9     print("in the module_guyun")
10 def running():
11     pass
12 ##main.py文件
13 #!/usr/bin/env python
14 # -*- coding: utf-8 -*-
15 # Author :GU
16 #from module_guyun import logger as logger_guyun
17 import sys,os
18 x  = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
19 #print(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
20 sys.path.append(x)
21 import module_guyun
22 module_guyun.say_hallo()
23 module_guyun.logger()
24 ####
25 #执行结果：
26 hello guyun
27 in the module_guyun

　　5、如何导入一个包

　　-package_test

　　　　-test1.py

　　　　-__init__.py

　　-p_test.py

 1 #init文件
 2 #!/usr/bin/env python
 3 # -*- coding: utf-8 -*-
 4 # Author :GU
 5 print("from test package package_test")
 6 from . import test1
 7 #test1文件
 8 #!/usr/bin/env python
 9 # -*- coding: utf-8 -*-
10 # Author :GU
11 def test():
12     print("in the test1")
13 ###调用文件
14 #!/usr/bin/env python
15 # -*- coding: utf-8 -*-
16 # Author :GU
17 import package_test ###执行init.py文件
18 package_test.test1.test()
19 #执行结果：
20 from test package package_test
21 in the test1
22 ####达到的目的就是在同一级目录倒入一个包的文件，中间通过init文件调度

总结

　　import module_alex
　　import module_alex，module2_alex #调用多个模块
　　for module_alex import * ###不建议用
　　from module_alex import m1,m2,m3 ##调用一个模块中的多个小模块
　　from module_alex import logger as logger_alex ###别名

三、import本质（路径搜索和搜索路径）

　　导入模块的本质就是把python文件解释一遍

　　import moile_name ------->module_name.py ----->module_name.py的路径----->sys.path
导入包的本质就是在执行这个包里面的__init__.py文件

四、导入优化
　　

五、模块的分类
　a：标准库（内置）

　b：开源模块

　c：自定义模块

　　1、标准库

　　a、time和datetime

　　在Python中，通常有这几种方式来表示时间：1）时间戳 2）格式化的时间字符串 3）元组（struct_time）共九个元素。由于Python的time模块实现主要调用C库，所以各个平台可能有所不同。

　　UTC（Coordinated Universal Time，世界协调时）亦即格林威治天文时间，世界标准时间。在中国为UTC+8。DST（Daylight Saving Time）即夏令时。

　　时间戳（timestamp）的方式：通常来说，时间戳表示的是从1970年1月1日00:00:00开始按秒计算的偏移量。我们运行“type(time.time())”，返回的是float类型。返回时间戳方式的函数主要有time()，clock()等。

　　元组（struct_time）方式：struct_time元组共有9个元素，返回struct_time的函数主要有gmtime()，localtime()，strptime()。下面列出这种方式元组中的几个元素：

　　1）time.localtime([secs])：将一个时间戳转换为当前时区的struct_time。secs参数未提供，则以当前时间为准。

1 >>> time.localtime()
2 time.struct_time(tm_year=2011, tm_mon=5, tm_mday=5, tm_hour=14, tm_min=14, tm_sec=50, tm_wday=3, tm_yday=125, tm_isdst=0)
3 >>> time.localtime(1304575584.1361799)
4 time.struct_time(tm_year=2011, tm_mon=5, tm_mday=5, tm_hour=14, tm_min=6, tm_sec=24, tm_wday=3, tm_yday=125, tm_isdst=0)

　　2）time.gmtime([secs])：和localtime()方法类似，gmtime()方法是将一个时间戳转换为UTC时区（0时区）的struct_time。

1 >>>time.gmtime()
2 time.struct_time(tm_year=2011, tm_mon=5, tm_mday=5, tm_hour=6, tm_min=19, tm_sec=48, tm_wday=3, tm_yday=125, tm_isdst=0)

　　3）time.time()：返回当前时间的时间戳。

1 >>> time.time()
2 1304575584.1361799

　　4）time.mktime(t)：将一个struct_time转化为时间戳。

1 >>> time.mktime(time.localtime())
2 1304576839.0

　　5）time.sleep(secs)：线程推迟指定的时间运行。单位为秒。

　　6）time.clock()：这个需要注意，在不同的系统上含义不同。在UNIX系统上，它返回的是“进程时间”，它是用秒表示的浮点数（时间戳）。而在WINDOWS中，第一次调用，返回的是进程运行的实际时间。而第二次之后的调用是自第一次调用以后到现在的运行时间。（实际上是以WIN32上QueryPerformanceCounter()为基础，它比毫秒表示更为精确）

1 import time
2 if __name__ == ‘__main__‘:
3     time.sleep(1)
4     print "clock1:%s" % time.clock()
5     time.sleep(1)
6     print "clock2:%s" % time.clock()
7     time.sleep(1)
8     print "clock3:%s" % time.clock()

　　执行结果：

1 clock1:3.35238137808e-006
2 clock2:1.00004944763
3 clock3:2.00012040636

　　其中第一个clock()输出的是程序运行时间
　　第二、三个clock()输出的都是与第一个clock的时间间隔

　　7）time.asctime([t])：把一个表示时间的元组或者struct_time表示为这种形式：‘Sun Jun 20 23:21:05 1993‘。如果没有参数，将会将time.localtime()作为参数传入。

1 >>> time.asctime()
2 ‘Thu May 5 14:55:43 2011‘

　　8）time.ctime([secs])：把一个时间戳（按秒计算的浮点数）转化为time.asctime()的形式。如果参数未给或者为None的时候，将会默认time.time()为参数。它的作用相当于time.asctime(time.localtime(secs))。

1 >>> time.ctime()
2 ‘Thu May 5 14:58:09 2011‘
3 >>> time.ctime(time.time())
4 ‘Thu May 5 14:58:39 2011‘
5 >>> time.ctime(1304579615)
6 ‘Thu May 5 15:13:35 2011‘

　　9）time.strftime(format[, t])：把一个代表时间的元组或者struct_time（如由time.localtime()和time.gmtime()返回）转化为格式化的时间字符串。如果t未指定，将传入time.localtime()。如果元组中任何一个元素越界，ValueError的错误将会被抛出。

1 >>> time.strftime("%Y-%m-%d %X", time.localtime())
2 ‘2011-05-05 16:37:06‘

　　10）time.strptime(string[, format])：把一个格式化时间字符串转化为struct_time。实际上它和strftime()是逆操作。

1 >>> time.strptime(‘2011-05-05 16:37:06‘, ‘%Y-%m-%d %X‘)
2 time.struct_time(tm_year=2011, tm_mon=5, tm_mday=5, tm_hour=16, tm_min=37, tm_sec=6, tm_wday=3, tm_yday=125, tm_isdst=-1)

　　在这个函数中，format默认为："%a %b %d %H:%M:%S %Y"。

　　最后，我们来对time模块进行一个总结。根据之前描述，在Python中共有三种表达方式：1）timestamp 2）tuple或者struct_time 3）格式化字符串。

　　时间转换关系

 1 #_*_coding:utf-8_*_
 2 import time
 3 # print(time.clock()) #返回处理器时间,3.3开始已废弃 , 改成了time.process_time()测量处理器运算时间,不包括sleep时间,不稳定,mac上测不出来
 4 # print(time.altzone)  #返回与utc时间的时间差,以秒计算\
 5 # print(time.asctime()) #返回时间格式"Fri Aug 19 11:14:16 2016",
 6 # print(time.localtime()) #返回本地时间 的struct time对象格式
 7 # print(time.gmtime(time.time()-800000)) #返回utc时间的struc时间对象格式          当你没插入值的时候，，默认传入你当前时间，返回标准时间第一时区
 8
 9 # print(time.asctime(time.localtime())) #返回时间格式"Fri Aug 19 11:14:16 2016",
10 #print(time.ctime()) #返回Fri Aug 19 12:38:29 2016 格式, 同上
11
12 # 日期字符串 转成  时间戳
13 # string_2_struct = time.strptime("2016/05/22","%Y/%m/%d") #将 日期字符串 转成 struct时间对象格式
14 # print(string_2_struct)
16 # struct_2_stamp = time.mktime(string_2_struct) #将struct时间对象转成时间戳
17 # print(struct_2_stamp)
18
19 #将时间戳转为字符串格式
20 # print(time.gmtime(time.time()-86640)) #将utc时间戳转换成struct_time格式
21 # print(time.strftime("%Y-%m-%d %H:%M:%S",time.gmtime()) ) #将utc struct_time格式转成指定的字符串格式
22
23 #时间加减
24 import datetime
25 # print(datetime.datetime.now()) #返回 2016-08-19 12:47:03.941925
26 #print(datetime.date.fromtimestamp(time.time()) )  # 时间戳直接转成日期格式 2016-08-19
27 # print(datetime.datetime.now() )  获取当前时间
28 # print(datetime.datetime.now() + datetime.timedelta(3)) #当前时间+3天
29 # print(datetime.datetime.now() + datetime.timedelta(-3)) #当前时间-3天
30 # print(datetime.datetime.now() + datetime.timedelta(hours=3)) #当前时间+3小时
31 # print(datetime.datetime.now() + datetime.timedelta(minutes=30)) #当前时间+30分
32 # c_time  = datetime.datetime.now()
33 # print(c_time.replace(minute=3,hour=2)) #时间替换
34 ####################格式参照####################
35 %a    本地（locale）简化星期名称
36 %A    本地完整星期名称
37 %b    本地简化月份名称
38 %B    本地完整月份名称
39 %c    本地相应的日期和时间表示
40 %d    一个月中的第几天（01 - 31）
41 %H    一天中的第几个小时（24小时制，00 - 23）
42 %I    第几个小时（12小时制，01 - 12）
43 %j    一年中的第几天（001 - 366）
44 %m    月份（01 - 12）
45 %M    分钟数（00 - 59）
46 %p    本地am或者pm的相应符    一
47 %S    秒（01 - 61）    二
48 %U    一年中的星期数。（00 - 53星期天是一个星期的开始。）第一个星期天之前的所有天数都放在第0周。    三
49 %w    一个星期中的第几天（0 - 6，0是星期天）    三
50 %W    和%U基本相同，不同的是%W以星期一为一个星期的开始。
51 %x    本地相应日期
52 %X    本地相应时间
53 %y    去掉世纪的年份（00 - 99）
54 %Y    完整的年份
55 %Z    时区的名字（如果不存在为空字符）
56 %%    ‘%’字符

　　##执行结果：

 1 3.9473128470428115e-07
 2 -32400
 3 Tue Aug 23 15:21:55 2016
 4 time.struct_time(tm_year=2016, tm_mon=8, tm_mday=23, tm_hour=15, tm_min=21, tm_sec=55, tm_wday=1, tm_yday=236, tm_isdst=0)
 5 time.struct_time(tm_year=2016, tm_mon=8, tm_mday=14, tm_hour=1, tm_min=8, tm_sec=35, tm_wday=6, tm_yday=227, tm_isdst=0)
 6 Tue Aug 23 15:21:55 2016
 7 Tue Aug 23 15:21:55 2016
 8 time.struct_time(tm_year=2016, tm_mon=5, tm_mday=22, tm_hour=0, tm_min=0, tm_sec=0, tm_wday=6, tm_yday=143, tm_isdst=-1)
 9 1463846400.0
10 time.struct_time(tm_year=2016, tm_mon=8, tm_mday=22, tm_hour=7, tm_min=17, tm_sec=55, tm_wday=0, tm_yday=235, tm_isdst=0)
11 2016-08-23 07:21:55
12 2016-08-23 15:21:55.438771
13 2016-08-23
14 2016-08-23 15:21:55.438771
15 2016-08-26 15:21:55.438771
16 2016-08-20 15:21:55.438771
17 2016-08-23 18:21:55.438771
18 2016-08-23 15:51:55.438771
19 2016-08-23 02:03:55.438771

执行结果：一一对应

　　b、random模块

 1 #!/usr/bin/env python
 2 #_*_encoding: utf-8_*_
 3 import random
 4 print (random.random())  #0.6445010863311293
 5 #random.random()用于生成一个0到1的随机符点数: 0 <= n < 1.0
 6 print (random.randint(1,7)) #4
 7 #random.randint()的函数原型为：random.randint(a, b)，用于生成一个指定范围内的整数。
 8 # 其中参数a是下限，参数b是上限，生成的随机数n: a <= n <= b
 9 print (random.randrange(1,10)) #5
10 #random.randrange的函数原型为：random.randrange([start], stop[, step])，
11 # 从指定范围内，按指定基数递增的集合中 获取一个随机数。如：random.randrange(10, 100, 2)，
12 # 结果相当于从[10, 12, 14, 16, ... 96, 98]序列中获取一个随机数。
13 # random.randrange(10, 100, 2)在结果上与 random.choice(range(10, 100, 2) 等效。
14 print(random.choice(‘liukuni‘)) #i
15 #random.choice从序列中获取一个随机元素。
16 # 其函数原型为：random.choice(sequence)。参数sequence表示一个有序类型。
17 # 这里要说明一下：sequence在python不是一种特定的类型，而是泛指一系列的类型。
18 # list, tuple, 字符串都属于sequence。有关sequence可以查看python手册数据模型这一章。
19 # 下面是使用choice的一些例子：
20 print(random.choice("学习Python"))#学
21 print(random.choice(["JGood","is","a","handsome","boy"]))  #List
22 print(random.choice(("Tuple","List","Dict")))   #List
23 print(random.sample([1,2,3,4,5],3))    #[1, 2, 5]
24 #random.sample的函数原型为：random.sample(sequence, k)，从指定序列中随机获取指定长度的片断。sample函数不会修改原有序列。

　　实际应用

 1 #!/usr/bin/env python
 2 # encoding: utf-8
 3 import random
 4 import string
 5 #随机整数：
 6 print( random.randint(0,99))  #70
 7
 8 #随机选取0到100间的偶数：
 9 print(random.randrange(0, 101, 2)) #4
10
11 #随机浮点数：
12 print( random.random()) #0.2746445568079129
13 print(random.uniform(1, 10)) #9.887001463194844
14
15 #随机字符：
16 print(random.choice(‘abcdefg&#%^*f‘)) #f
17
18 #多个字符中选取特定数量的字符：
19 print(random.sample(‘abcdefghij‘,3)) #[‘f‘, ‘h‘, ‘d‘]
20
21 #随机选取字符串：
22 print( random.choice ( [‘apple‘, ‘pear‘, ‘peach‘, ‘orange‘, ‘lemon‘] )) #apple
23 #洗牌#
24 items = [1,2,3,4,5,6,7]
25 print(items) #[1, 2, 3, 4, 5, 6, 7]
26 random.shuffle(items)
27 print(items) #[1, 4, 7, 2, 5, 3, 6]

　　生产随机验证码

 1 import random
 2 checkcode = ‘‘
 3 for i in range(4):
 4     current = random.randrange(0,4)
 5     if current != i:
 6         temp = chr(random.randint(65,90))
 7     else:
 8         temp = random.randint(0,9)
 9     checkcode += str(temp)
10 print (checkcode)

　　c、os模块

　　提供对操作系统进行调用的接口

 1 os.getcwd() 获取当前工作目录，即当前python脚本工作的目录路径
 2 os.chdir("dirname")  改变当前脚本工作目录；相当于shell下cd
 3 os.curdir  返回当前目录: (‘.‘)
 4 os.pardir  获取当前目录的父目录字符串名：(‘..‘)
 5 os.makedirs(‘dirname1/dirname2‘)    可生成多层递归目录
 6 os.removedirs(‘dirname1‘)    若目录为空，则删除，并递归到上一级目录，如若也为空，则删除，依此类推
 7 os.mkdir(‘dirname‘)    生成单级目录；相当于shell中mkdir dirname
 8 os.rmdir(‘dirname‘)    删除单级空目录，若目录不为空则无法删除，报错；相当于shell中rmdir dirname
 9 os.listdir(‘dirname‘)    列出指定目录下的所有文件和子目录，包括隐藏文件，并以列表方式打印
10 os.remove()  删除一个文件
11 os.rename("oldname","newname")  重命名文件/目录
12 os.stat(‘path/filename‘)  获取文件/目录信息
13 os.sep    输出操作系统特定的路径分隔符，win下为"\\",Linux下为"/"
14 os.linesep    输出当前平台使用的行终止符，win下为"\r\n",Linux下为"\n"
15 os.pathsep    输出用于分割文件路径的字符串
16 os.name    输出字符串指示当前使用平台。win->‘nt‘; Linux->‘posix‘
17 os.system("bash command")  运行shell命令，直接显示
18 os.environ  获取系统环境变量
19 os.path.abspath(path)  返回path规范化的绝对路径
20 os.path.split(path)  将path分割成目录和文件名二元组返回
21 os.path.dirname(path)  返回path的目录。其实就是os.path.split(path)的第一个元素
22 os.path.basename(path)  返回path最后的文件名。如何path以／或\结尾，那么就会返回空值。即os.path.split(path)的第二个元素
23 os.path.exists(path)  如果path存在，返回True；如果path不存在，返回False
24 os.path.isabs(path)  如果path是绝对路径，返回True
25 os.path.isfile(path)  如果path是一个存在的文件，返回True。否则返回False
26 os.path.isdir(path)  如果path是一个存在的目录，则返回True。否则返回False
27 os.path.join(path1[, path2[, ...]])  将多个路径组合后返回，第一个绝对路径之前的参数将被忽略
28 os.path.getatime(path)  返回path所指向的文件或者目录的最后存取时间
29 os.path.getmtime(path)  返回path所指向的文件或者目录的最后修改时间

　　d、sys模块

1 sys.argv           命令行参数List，第一个元素是程序本身路径，读取参数
2 sys.exit(n)        退出程序，正常退出时exit(0)
3 sys.version        获取Python解释程序的版本信息
4 sys.maxint         最大的Int值
5 sys.path           返回模块的搜索路径，初始化时使用PYTHONPATH环境变量的值
6 sys.platform       返回操作系统平台名称
7 sys.stdout.write(‘please:‘)
8 val = sys.stdin.readline()[:-1]

　　e、shutil

　　高级的文件、文件夹、压缩包处理模块

　　1、shutil.copyfileobj(fsrc, fdst[, length])
　　将文件内容拷贝到另一个文件中，可以部分内容

　　2、shutil.copyfile(src, dst)
　　拷贝文件

　　3、shutil.copymode(src, dst)
　　仅拷贝权限。内容、组、用户均不变

1 def copymode(src, dst):
2     """Copy mode bits from src to dst"""
3     if hasattr(os, ‘chmod‘):
4         st = os.stat(src)
5         mode = stat.S_IMODE(st.st_mode)
6         os.chmod(dst, mode)

　　4、shutil.copystat(src, dst)
　　拷贝状态的信息，包括：mode bits, atime, mtime, flags（要求拷贝的文件必须存在）

　　修改了修改时间，和访问时间

　　5、shutil.copy(src, dst)
　　拷贝文件和权限

 1 def copy(src, dst):
 2     """Copy data and mode bits ("cp src dst").
 3
 4     The destination may be a directory.
 5
 6     """
 7     if os.path.isdir(dst):
 8         dst = os.path.join(dst, os.path.basename(src))
 9     copyfile(src, dst)
10     copymode(src, dst)

　　6、shutil.copy2(src, dst)
　　拷贝文件和状态信息

 1 def copy2(src, dst):
 2     """Copy data and all stat info ("cp -p src dst").
 3
 4     The destination may be a directory.
 5
 6     """
 7     if os.path.isdir(dst):
 8         dst = os.path.join(dst, os.path.basename(src))
 9     copyfile(src, dst)
10     copystat(src, dst)

　　7、shutil.ignore_patterns(*patterns)
　　 shutil.copytree(src, dst, symlinks=False, ignore=None)
　　递归的去拷贝文件

　　例如：copytree(source, destination, ignore=ignore_patterns(‘*.pyc‘, ‘tmp*‘))

　　8、 shutil.rmtree(path[, ignore_errors[, onerror]])
　　递归的去删除文件

　　9、shutil.move(src, dst)
　　递归的去移动文件

　　10、shutil.make_archive(base_name, format,...)

　　创建压缩包并返回文件路径，例如：zip、tar

　　base_name：压缩包的文件名，也可以是压缩包的路径。只是文件名时，则保存至当前目录，否则保存至指定路径，
　　　　如：www =>保存至当前路径
　　　　如：/Users/wupeiqi/www =>保存至/Users/wupeiqi/

　　format：压缩包种类，“zip”, “tar”, “bztar”，“gztar”

　　root_dir：要压缩的文件夹路径（默认当前目录）

　　owner：用户，默认当前用户

　　group：组，默认当前组

　　logger：用于记录日志，通常是logging.Logger对象

1 #将 /Users/wupeiqi/Downloads/test 下的文件打包放置当前程序目录
2 import shutil
3 ret = shutil.make_archive("wwwwwwwwww", ‘gztar‘, root_dir=‘/Users/wupeiqi/Downloads/test‘)
4 #将 /Users/wupeiqi/Downloads/test 下的文件打包放置 /Users/wupeiqi/目录
5 import shutil
6 ret = shutil.make_archive("/Users/wupeiqi/wwwwwwwwww", ‘gztar‘, root_dir=‘/Users/wupeiqi/Downloads/test‘)

　　shutil 对压缩包的处理是调用 ZipFile 和 TarFile 两个模块来进行的，详细：

　　①、zipfile

 1 import zipfile
 2 # 压缩
 3 z = zipfile.ZipFile(‘laxi.zip‘, ‘w‘)
 4 z.write(‘a.log‘)
 5 z.write(‘data.data‘)
 6 z.close()
 7 # 解压
 8 z = zipfile.ZipFile(‘laxi.zip‘, ‘r‘)
 9 z.extractall()
10 z.close()

　　②、tarfile

 1 import tarfile
 2 # 压缩
 3 tar = tarfile.open(‘your.tar‘,‘w‘)
 4 tar.add(‘/Users/wupeiqi/PycharmProjects/bbs2.zip‘, arcname=‘bbs2.zip‘)
 5 tar.add(‘/Users/wupeiqi/PycharmProjects/cmdb.zip‘, arcname=‘cmdb.zip‘)
 6 tar.close()
 7 # 解压
 8 tar = tarfile.open(‘your.tar‘,‘r‘)
 9 tar.extractall()  # 可设置解压地址
10 tar.close()

　　③、ZipFile

  1 class ZipFile(object):
  2     """ Class with methods to open, read, write, close, list zip files.
  3
  4     z = ZipFile(file, mode="r", compression=ZIP_STORED, allowZip64=False)
  5
  6     file: Either the path to the file, or a file-like object.
  7           If it is a path, the file will be opened and closed by ZipFile.
  8     mode: The mode can be either read "r", write "w" or append "a".
  9     compression: ZIP_STORED (no compression) or ZIP_DEFLATED (requires zlib).
 10     allowZip64: if True ZipFile will create files with ZIP64 extensions when
 11                 needed, otherwise it will raise an exception when this would
 12                 be necessary.
 13
 14     """
 15
 16     fp = None                   # Set here since __del__ checks it
 17
 18     def __init__(self, file, mode="r", compression=ZIP_STORED, allowZip64=False):
 19         """Open the ZIP file with mode read "r", write "w" or append "a"."""
 20         if mode not in ("r", "w", "a"):
 21             raise RuntimeError(‘ZipFile() requires mode "r", "w", or "a"‘)
 22
 23         if compression == ZIP_STORED:
 24             pass
 25         elif compression == ZIP_DEFLATED:
 26             if not zlib:
 27                 raise RuntimeError, 28                       "Compression requires the (missing) zlib module"
 29         else:
 30             raise RuntimeError, "That compression method is not supported"
 31
 32         self._allowZip64 = allowZip64
 33         self._didModify = False
 34         self.debug = 0  # Level of printing: 0 through 3
 35         self.NameToInfo = {}    # Find file info given name
 36         self.filelist = []      # List of ZipInfo instances for archive
 37         self.compression = compression  # Method of compression
 38         self.mode = key = mode.replace(‘b‘, ‘‘)[0]
 39         self.pwd = None
 40         self._comment = ‘‘
 41
 42         # Check if we were passed a file-like object
 43         if isinstance(file, basestring):
 44             self._filePassed = 0
 45             self.filename = file
 46             modeDict = {‘r‘ : ‘rb‘, ‘w‘: ‘wb‘, ‘a‘ : ‘r+b‘}
 47             try:
 48                 self.fp = open(file, modeDict[mode])
 49             except IOError:
 50                 if mode == ‘a‘:
 51                     mode = key = ‘w‘
 52                     self.fp = open(file, modeDict[mode])
 53                 else:
 54                     raise
 55         else:
 56             self._filePassed = 1
 57             self.fp = file
 58             self.filename = getattr(file, ‘name‘, None)
 59
 60         try:
 61             if key == ‘r‘:
 62                 self._RealGetContents()
 63             elif key == ‘w‘:
 64                 # set the modified flag so central directory gets written
 65                 # even if no files are added to the archive
 66                 self._didModify = True
 67             elif key == ‘a‘:
 68                 try:
 69                     # See if file is a zip file
 70                     self._RealGetContents()
 71                     # seek to start of directory and overwrite
 72                     self.fp.seek(self.start_dir, 0)
 73                 except BadZipfile:
 74                     # file is not a zip file, just append
 75                     self.fp.seek(0, 2)
 76
 77                     # set the modified flag so central directory gets written
 78                     # even if no files are added to the archive
 79                     self._didModify = True
 80             else:
 81                 raise RuntimeError(‘Mode must be "r", "w" or "a"‘)
 82         except:
 83             fp = self.fp
 84             self.fp = None
 85             if not self._filePassed:
 86                 fp.close()
 87             raise
 88
 89     def __enter__(self):
 90         return self
 91
 92     def __exit__(self, type, value, traceback):
 93         self.close()
 94
 95     def _RealGetContents(self):
 96         """Read in the table of contents for the ZIP file."""
 97         fp = self.fp
 98         try:
 99             endrec = _EndRecData(fp)
100         except IOError:
101             raise BadZipfile("File is not a zip file")
102         if not endrec:
103             raise BadZipfile, "File is not a zip file"
104         if self.debug > 1:
105             print endrec
106         size_cd = endrec[_ECD_SIZE]             # bytes in central directory
107         offset_cd = endrec[_ECD_OFFSET]         # offset of central directory
108         self._comment = endrec[_ECD_COMMENT]    # archive comment
109
110         # "concat" is zero, unless zip was concatenated to another file
111         concat = endrec[_ECD_LOCATION] - size_cd - offset_cd
112         if endrec[_ECD_SIGNATURE] == stringEndArchive64:
113             # If Zip64 extension structures are present, account for them
114             concat -= (sizeEndCentDir64 + sizeEndCentDir64Locator)
115
116         if self.debug > 2:
117             inferred = concat + offset_cd
118             print "given, inferred, offset", offset_cd, inferred, concat
119         # self.start_dir:  Position of start of central directory
120         self.start_dir = offset_cd + concat
121         fp.seek(self.start_dir, 0)
122         data = fp.read(size_cd)
123         fp = cStringIO.StringIO(data)
124         total = 0
125         while total < size_cd:
126             centdir = fp.read(sizeCentralDir)
127             if len(centdir) != sizeCentralDir:
128                 raise BadZipfile("Truncated central directory")
129             centdir = struct.unpack(structCentralDir, centdir)
130             if centdir[_CD_SIGNATURE] != stringCentralDir:
131                 raise BadZipfile("Bad magic number for central directory")
132             if self.debug > 2:
133                 print centdir
134             filename = fp.read(centdir[_CD_FILENAME_LENGTH])
135             # Create ZipInfo instance to store file information
136             x = ZipInfo(filename)
137             x.extra = fp.read(centdir[_CD_EXTRA_FIELD_LENGTH])
138             x.comment = fp.read(centdir[_CD_COMMENT_LENGTH])
139             x.header_offset = centdir[_CD_LOCAL_HEADER_OFFSET]
140             (x.create_version, x.create_system, x.extract_version, x.reserved,
141                 x.flag_bits, x.compress_type, t, d,
142                 x.CRC, x.compress_size, x.file_size) = centdir[1:12]
143             x.volume, x.internal_attr, x.external_attr = centdir[15:18]
144             # Convert date/time code to (year, month, day, hour, min, sec)
145             x._raw_time = t
146             x.date_time = ( (d>>9)+1980, (d>>5)&0xF, d&0x1F,
147                                      t>>11, (t>>5)&0x3F, (t&0x1F) * 2 )
148
149             x._decodeExtra()
150             x.header_offset = x.header_offset + concat
151             x.filename = x._decodeFilename()
152             self.filelist.append(x)
153             self.NameToInfo[x.filename] = x
154
155             # update total bytes read from central directory
156             total = (total + sizeCentralDir + centdir[_CD_FILENAME_LENGTH]
157                      + centdir[_CD_EXTRA_FIELD_LENGTH]
158                      + centdir[_CD_COMMENT_LENGTH])
159
160             if self.debug > 2:
161                 print "total", total
162
163
164     def namelist(self):
165         """Return a list of file names in the archive."""
166         l = []
167         for data in self.filelist:
168             l.append(data.filename)
169         return l
170
171     def infolist(self):
172         """Return a list of class ZipInfo instances for files in the
173         archive."""
174         return self.filelist
175
176     def printdir(self):
177         """Print a table of contents for the zip file."""
178         print "%-46s %19s %12s" % ("File Name", "Modified    ", "Size")
179         for zinfo in self.filelist:
180             date = "%d-%02d-%02d %02d:%02d:%02d" % zinfo.date_time[:6]
181             print "%-46s %s %12d" % (zinfo.filename, date, zinfo.file_size)
182
183     def testzip(self):
184         """Read all the files and check the CRC."""
185         chunk_size = 2 ** 20
186         for zinfo in self.filelist:
187             try:
188                 # Read by chunks, to avoid an OverflowError or a
189                 # MemoryError with very large embedded files.
190                 with self.open(zinfo.filename, "r") as f:
191                     while f.read(chunk_size):     # Check CRC-32
192                         pass
193             except BadZipfile:
194                 return zinfo.filename
195
196     def getinfo(self, name):
197         """Return the instance of ZipInfo given ‘name‘."""
198         info = self.NameToInfo.get(name)
199         if info is None:
200             raise KeyError(
201                 ‘There is no item named %r in the archive‘ % name)
202
203         return info
204
205     def setpassword(self, pwd):
206         """Set default password for encrypted files."""
207         self.pwd = pwd
208
209     @property
210     def comment(self):
211         """The comment text associated with the ZIP file."""
212         return self._comment
213
214     @comment.setter
215     def comment(self, comment):
216         # check for valid comment length
217         if len(comment) > ZIP_MAX_COMMENT:
218             import warnings
219             warnings.warn(‘Archive comment is too long; truncating to %d bytes‘
220                           % ZIP_MAX_COMMENT, stacklevel=2)
221             comment = comment[:ZIP_MAX_COMMENT]
222         self._comment = comment
223         self._didModify = True
224
225     def read(self, name, pwd=None):
226         """Return file bytes (as a string) for name."""
227         return self.open(name, "r", pwd).read()
228
229     def open(self, name, mode="r", pwd=None):
230         """Return file-like object for ‘name‘."""
231         if mode not in ("r", "U", "rU"):
232             raise RuntimeError, ‘open() requires mode "r", "U", or "rU"‘
233         if not self.fp:
234             raise RuntimeError, 235                   "Attempt to read ZIP archive that was already closed"
236
237         # Only open a new file for instances where we were not
238         # given a file object in the constructor
239         if self._filePassed:
240             zef_file = self.fp
241             should_close = False
242         else:
243             zef_file = open(self.filename, ‘rb‘)
244             should_close = True
245
246         try:
247             # Make sure we have an info object
248             if isinstance(name, ZipInfo):
249                 # ‘name‘ is already an info object
250                 zinfo = name
251             else:
252                 # Get info object for name
253                 zinfo = self.getinfo(name)
254
255             zef_file.seek(zinfo.header_offset, 0)
256
257             # Skip the file header:
258             fheader = zef_file.read(sizeFileHeader)
259             if len(fheader) != sizeFileHeader:
260                 raise BadZipfile("Truncated file header")
261             fheader = struct.unpack(structFileHeader, fheader)
262             if fheader[_FH_SIGNATURE] != stringFileHeader:
263                 raise BadZipfile("Bad magic number for file header")
264
265             fname = zef_file.read(fheader[_FH_FILENAME_LENGTH])
266             if fheader[_FH_EXTRA_FIELD_LENGTH]:
267                 zef_file.read(fheader[_FH_EXTRA_FIELD_LENGTH])
268
269             if fname != zinfo.orig_filename:
270                 raise BadZipfile, 271                         ‘File name in directory "%s" and header "%s" differ.‘ % (
272                             zinfo.orig_filename, fname)
273
274             # check for encrypted flag & handle password
275             is_encrypted = zinfo.flag_bits & 0x1
276             zd = None
277             if is_encrypted:
278                 if not pwd:
279                     pwd = self.pwd
280                 if not pwd:
281                     raise RuntimeError, "File %s is encrypted, " 282                         "password required for extraction" % name
283
284                 zd = _ZipDecrypter(pwd)
285                 # The first 12 bytes in the cypher stream is an encryption header
286                 #  used to strengthen the algorithm. The first 11 bytes are
287                 #  completely random, while the 12th contains the MSB of the CRC,
288                 #  or the MSB of the file time depending on the header type
289                 #  and is used to check the correctness of the password.
290                 bytes = zef_file.read(12)
291                 h = map(zd, bytes[0:12])
292                 if zinfo.flag_bits & 0x8:
293                     # compare against the file type from extended local headers
294                     check_byte = (zinfo._raw_time >> 8) & 0xff
295                 else:
296                     # compare against the CRC otherwise
297                     check_byte = (zinfo.CRC >> 24) & 0xff
298                 if ord(h[11]) != check_byte:
299                     raise RuntimeError("Bad password for file", name)
300
301             return ZipExtFile(zef_file, mode, zinfo, zd,
302                     close_fileobj=should_close)
303         except:
304             if should_close:
305                 zef_file.close()
306             raise
307
308     def extract(self, member, path=None, pwd=None):
309         """Extract a member from the archive to the current working directory,
310            using its full name. Its file information is extracted as accurately
311            as possible. `member‘ may be a filename or a ZipInfo object. You can
312            specify a different directory using `path‘.
313         """
314         if not isinstance(member, ZipInfo):
315             member = self.getinfo(member)
316
317         if path is None:
318             path = os.getcwd()
319
320         return self._extract_member(member, path, pwd)
321
322     def extractall(self, path=None, members=None, pwd=None):
323         """Extract all members from the archive to the current working
324            directory. `path‘ specifies a different directory to extract to.
325            `members‘ is optional and must be a subset of the list returned
326            by namelist().
327         """
328         if members is None:
329             members = self.namelist()
330
331         for zipinfo in members:
332             self.extract(zipinfo, path, pwd)
333
334     def _extract_member(self, member, targetpath, pwd):
335         """Extract the ZipInfo object ‘member‘ to a physical
336            file on the path targetpath.
337         """
338         # build the destination pathname, replacing
339         # forward slashes to platform specific separators.
340         arcname = member.filename.replace(‘/‘, os.path.sep)
341
342         if os.path.altsep:
343             arcname = arcname.replace(os.path.altsep, os.path.sep)
344         # interpret absolute pathname as relative, remove drive letter or
345         # UNC path, redundant separators, "." and ".." components.
346         arcname = os.path.splitdrive(arcname)[1]
347         arcname = os.path.sep.join(x for x in arcname.split(os.path.sep)
348                     if x not in (‘‘, os.path.curdir, os.path.pardir))
349         if os.path.sep == ‘\\‘:
350             # filter illegal characters on Windows
351             illegal = ‘:<>|"?*‘
352             if isinstance(arcname, unicode):
353                 table = {ord(c): ord(‘_‘) for c in illegal}
354             else:
355                 table = string.maketrans(illegal, ‘_‘ * len(illegal))
356             arcname = arcname.translate(table)
357             # remove trailing dots
358             arcname = (x.rstrip(‘.‘) for x in arcname.split(os.path.sep))
359             arcname = os.path.sep.join(x for x in arcname if x)
360
361         targetpath = os.path.join(targetpath, arcname)
362         targetpath = os.path.normpath(targetpath)
363
364         # Create all upper directories if necessary.
365         upperdirs = os.path.dirname(targetpath)
366         if upperdirs and not os.path.exists(upperdirs):
367             os.makedirs(upperdirs)
368
369         if member.filename[-1] == ‘/‘:
370             if not os.path.isdir(targetpath):
371                 os.mkdir(targetpath)
372             return targetpath
373
374         with self.open(member, pwd=pwd) as source, 375              file(targetpath, "wb") as target:
376             shutil.copyfileobj(source, target)
377
378         return targetpath
379
380     def _writecheck(self, zinfo):
381         """Check for errors before writing a file to the archive."""
382         if zinfo.filename in self.NameToInfo:
383             import warnings
384             warnings.warn(‘Duplicate name: %r‘ % zinfo.filename, stacklevel=3)
385         if self.mode not in ("w", "a"):
386             raise RuntimeError, ‘write() requires mode "w" or "a"‘
387         if not self.fp:
388             raise RuntimeError, 389                   "Attempt to write ZIP archive that was already closed"
390         if zinfo.compress_type == ZIP_DEFLATED and not zlib:
391             raise RuntimeError, 392                   "Compression requires the (missing) zlib module"
393         if zinfo.compress_type not in (ZIP_STORED, ZIP_DEFLATED):
394             raise RuntimeError, 395                   "That compression method is not supported"
396         if not self._allowZip64:
397             requires_zip64 = None
398             if len(self.filelist) >= ZIP_FILECOUNT_LIMIT:
399                 requires_zip64 = "Files count"
400             elif zinfo.file_size > ZIP64_LIMIT:
401                 requires_zip64 = "Filesize"
402             elif zinfo.header_offset > ZIP64_LIMIT:
403                 requires_zip64 = "Zipfile size"
404             if requires_zip64:
405                 raise LargeZipFile(requires_zip64 +
406                                    " would require ZIP64 extensions")
407
408     def write(self, filename, arcname=None, compress_type=None):
409         """Put the bytes from filename into the archive under the name
410         arcname."""
411         if not self.fp:
412             raise RuntimeError(
413                   "Attempt to write to ZIP archive that was already closed")
414
415         st = os.stat(filename)
416         isdir = stat.S_ISDIR(st.st_mode)
417         mtime = time.localtime(st.st_mtime)
418         date_time = mtime[0:6]
419         # Create ZipInfo instance to store file information
420         if arcname is None:
421             arcname = filename
422         arcname = os.path.normpath(os.path.splitdrive(arcname)[1])
423         while arcname[0] in (os.sep, os.altsep):
424             arcname = arcname[1:]
425         if isdir:
426             arcname += ‘/‘
427         zinfo = ZipInfo(arcname, date_time)
428         zinfo.external_attr = (st[0] & 0xFFFF) << 16L      # Unix attributes
429         if compress_type is None:
430             zinfo.compress_type = self.compression
431         else:
432             zinfo.compress_type = compress_type
433
434         zinfo.file_size = st.st_size
435         zinfo.flag_bits = 0x00
436         zinfo.header_offset = self.fp.tell()    # Start of header bytes
437
438         self._writecheck(zinfo)
439         self._didModify = True
440
441         if isdir:
442             zinfo.file_size = 0
443             zinfo.compress_size = 0
444             zinfo.CRC = 0
445             zinfo.external_attr |= 0x10  # MS-DOS directory flag
446             self.filelist.append(zinfo)
447             self.NameToInfo[zinfo.filename] = zinfo
448             self.fp.write(zinfo.FileHeader(False))
449             return
450
451         with open(filename, "rb") as fp:
452             # Must overwrite CRC and sizes with correct data later
453             zinfo.CRC = CRC = 0
454             zinfo.compress_size = compress_size = 0
455             # Compressed size can be larger than uncompressed size
456             zip64 = self._allowZip64 and 457                     zinfo.file_size * 1.05 > ZIP64_LIMIT
458             self.fp.write(zinfo.FileHeader(zip64))
459             if zinfo.compress_type == ZIP_DEFLATED:
460                 cmpr = zlib.compressobj(zlib.Z_DEFAULT_COMPRESSION,
461                      zlib.DEFLATED, -15)
462             else:
463                 cmpr = None
464             file_size = 0
465             while 1:
466                 buf = fp.read(1024 * 8)
467                 if not buf:
468                     break
469                 file_size = file_size + len(buf)
470                 CRC = crc32(buf, CRC) & 0xffffffff
471                 if cmpr:
472                     buf = cmpr.compress(buf)
473                     compress_size = compress_size + len(buf)
474                 self.fp.write(buf)
475         if cmpr:
476             buf = cmpr.flush()
477             compress_size = compress_size + len(buf)
478             self.fp.write(buf)
479             zinfo.compress_size = compress_size
480         else:
481             zinfo.compress_size = file_size
482         zinfo.CRC = CRC
483         zinfo.file_size = file_size
484         if not zip64 and self._allowZip64:
485             if file_size > ZIP64_LIMIT:
486                 raise RuntimeError(‘File size has increased during compressing‘)
487             if compress_size > ZIP64_LIMIT:
488                 raise RuntimeError(‘Compressed size larger than uncompressed size‘)
489         # Seek backwards and write file header (which will now include
490         # correct CRC and file sizes)
491         position = self.fp.tell()       # Preserve current position in file
492         self.fp.seek(zinfo.header_offset, 0)
493         self.fp.write(zinfo.FileHeader(zip64))
494         self.fp.seek(position, 0)
495         self.filelist.append(zinfo)
496         self.NameToInfo[zinfo.filename] = zinfo
497
498     def writestr(self, zinfo_or_arcname, bytes, compress_type=None):
499         """Write a file into the archive.  The contents is the string
500         ‘bytes‘.  ‘zinfo_or_arcname‘ is either a ZipInfo instance or
501         the name of the file in the archive."""
502         if not isinstance(zinfo_or_arcname, ZipInfo):
503             zinfo = ZipInfo(filename=zinfo_or_arcname,
504                             date_time=time.localtime(time.time())[:6])
505
506             zinfo.compress_type = self.compression
507             if zinfo.filename[-1] == ‘/‘:
508                 zinfo.external_attr = 0o40775 << 16   # drwxrwxr-x
509                 zinfo.external_attr |= 0x10           # MS-DOS directory flag
510             else:
511                 zinfo.external_attr = 0o600 << 16     # ?rw-------
512         else:
513             zinfo = zinfo_or_arcname
514
515         if not self.fp:
516             raise RuntimeError(
517                   "Attempt to write to ZIP archive that was already closed")
518
519         if compress_type is not None:
520             zinfo.compress_type = compress_type
521
522         zinfo.file_size = len(bytes)            # Uncompressed size
523         zinfo.header_offset = self.fp.tell()    # Start of header bytes
524         self._writecheck(zinfo)
525         self._didModify = True
526         zinfo.CRC = crc32(bytes) & 0xffffffff       # CRC-32 checksum
527         if zinfo.compress_type == ZIP_DEFLATED:
528             co = zlib.compressobj(zlib.Z_DEFAULT_COMPRESSION,
529                  zlib.DEFLATED, -15)
530             bytes = co.compress(bytes) + co.flush()
531             zinfo.compress_size = len(bytes)    # Compressed size
532         else:
533             zinfo.compress_size = zinfo.file_size
534         zip64 = zinfo.file_size > ZIP64_LIMIT or 535                 zinfo.compress_size > ZIP64_LIMIT
536         if zip64 and not self._allowZip64:
537             raise LargeZipFile("Filesize would require ZIP64 extensions")
538         self.fp.write(zinfo.FileHeader(zip64))
539         self.fp.write(bytes)
540         if zinfo.flag_bits & 0x08:
541             # Write CRC and file sizes after the file data
542             fmt = ‘<LQQ‘ if zip64 else ‘<LLL‘
543             self.fp.write(struct.pack(fmt, zinfo.CRC, zinfo.compress_size,
544                   zinfo.file_size))
545         self.fp.flush()
546         self.filelist.append(zinfo)
547         self.NameToInfo[zinfo.filename] = zinfo
548
549     def __del__(self):
550         """Call the "close()" method in case the user forgot."""
551         self.close()
552
553     def close(self):
554         """Close the file, and for mode "w" and "a" write the ending
555         records."""
556         if self.fp is None:
557             return
558
559         try:
560             if self.mode in ("w", "a") and self._didModify: # write ending records
561                 pos1 = self.fp.tell()
562                 for zinfo in self.filelist:         # write central directory
563                     dt = zinfo.date_time
564                     dosdate = (dt[0] - 1980) << 9 | dt[1] << 5 | dt[2]
565                     dostime = dt[3] << 11 | dt[4] << 5 | (dt[5] // 2)
566                     extra = []
567                     if zinfo.file_size > ZIP64_LIMIT 568                             or zinfo.compress_size > ZIP64_LIMIT:
569                         extra.append(zinfo.file_size)
570                         extra.append(zinfo.compress_size)
571                         file_size = 0xffffffff
572                         compress_size = 0xffffffff
573                     else:
574                         file_size = zinfo.file_size
575                         compress_size = zinfo.compress_size
576
577                     if zinfo.header_offset > ZIP64_LIMIT:
578                         extra.append(zinfo.header_offset)
579                         header_offset = 0xffffffffL
580                     else:
581                         header_offset = zinfo.header_offset
582
583                     extra_data = zinfo.extra
584                     if extra:
585                         # Append a ZIP64 field to the extra‘s
586                         extra_data = struct.pack(
587                                 ‘<HH‘ + ‘Q‘*len(extra),
588                                 1, 8*len(extra), *extra) + extra_data
589
590                         extract_version = max(45, zinfo.extract_version)
591                         create_version = max(45, zinfo.create_version)
592                     else:
593                         extract_version = zinfo.extract_version
594                         create_version = zinfo.create_version
595
596                     try:
597                         filename, flag_bits = zinfo._encodeFilenameFlags()
598                         centdir = struct.pack(structCentralDir,
599                         stringCentralDir, create_version,
600                         zinfo.create_system, extract_version, zinfo.reserved,
601                         flag_bits, zinfo.compress_type, dostime, dosdate,
602                         zinfo.CRC, compress_size, file_size,
603                         len(filename), len(extra_data), len(zinfo.comment),
604                         0, zinfo.internal_attr, zinfo.external_attr,
605                         header_offset)
606                     except DeprecationWarning:
607                         print >>sys.stderr, (structCentralDir,
608                         stringCentralDir, create_version,
609                         zinfo.create_system, extract_version, zinfo.reserved,
610                         zinfo.flag_bits, zinfo.compress_type, dostime, dosdate,
611                         zinfo.CRC, compress_size, file_size,
612                         len(zinfo.filename), len(extra_data), len(zinfo.comment),
613                         0, zinfo.internal_attr, zinfo.external_attr,
614                         header_offset)
615                         raise
616                     self.fp.write(centdir)
617                     self.fp.write(filename)
618                     self.fp.write(extra_data)
619                     self.fp.write(zinfo.comment)
620
621                 pos2 = self.fp.tell()
622                 # Write end-of-zip-archive record
623                 centDirCount = len(self.filelist)
624                 centDirSize = pos2 - pos1
625                 centDirOffset = pos1
626                 requires_zip64 = None
627                 if centDirCount > ZIP_FILECOUNT_LIMIT:
628                     requires_zip64 = "Files count"
629                 elif centDirOffset > ZIP64_LIMIT:
630                     requires_zip64 = "Central directory offset"
631                 elif centDirSize > ZIP64_LIMIT:
632                     requires_zip64 = "Central directory size"
633                 if requires_zip64:
634                     # Need to write the ZIP64 end-of-archive records
635                     if not self._allowZip64:
636                         raise LargeZipFile(requires_zip64 +
637                                            " would require ZIP64 extensions")
638                     zip64endrec = struct.pack(
639                             structEndArchive64, stringEndArchive64,
640                             44, 45, 45, 0, 0, centDirCount, centDirCount,
641                             centDirSize, centDirOffset)
642                     self.fp.write(zip64endrec)
643
644                     zip64locrec = struct.pack(
645                             structEndArchive64Locator,
646                             stringEndArchive64Locator, 0, pos2, 1)
647                     self.fp.write(zip64locrec)
648                     centDirCount = min(centDirCount, 0xFFFF)
649                     centDirSize = min(centDirSize, 0xFFFFFFFF)
650                     centDirOffset = min(centDirOffset, 0xFFFFFFFF)
651
652                 endrec = struct.pack(structEndArchive, stringEndArchive,
653                                     0, 0, centDirCount, centDirCount,
654                                     centDirSize, centDirOffset, len(self._comment))
655                 self.fp.write(endrec)
656                 self.fp.write(self._comment)
657                 self.fp.flush()
658         finally:
659             fp = self.fp
660             self.fp = None
661             if not self._filePassed:
662                 fp.close()

　　④、TarFile

  1 class TarFile(object):
  2     """The TarFile Class provides an interface to tar archives.
  3     """
  4
  5     debug = 0                   # May be set from 0 (no msgs) to 3 (all msgs)
  6
  7     dereference = False         # If true, add content of linked file to the
  8                                 # tar file, else the link.
  9
 10     ignore_zeros = False        # If true, skips empty or invalid blocks and
 11                                 # continues processing.
 12
 13     errorlevel = 1              # If 0, fatal errors only appear in debug
 14                                 # messages (if debug >= 0). If > 0, errors
 15                                 # are passed to the caller as exceptions.
 16
 17     format = DEFAULT_FORMAT     # The format to use when creating an archive.
 18
 19     encoding = ENCODING         # Encoding for 8-bit character strings.
 20
 21     errors = None               # Error handler for unicode conversion.
 22
 23     tarinfo = TarInfo           # The default TarInfo class to use.
 24
 25     fileobject = ExFileObject   # The default ExFileObject class to use.
 26
 27     def __init__(self, name=None, mode="r", fileobj=None, format=None,
 28             tarinfo=None, dereference=None, ignore_zeros=None, encoding=None,
 29             errors=None, pax_headers=None, debug=None, errorlevel=None):
 30         """Open an (uncompressed) tar archive `name‘. `mode‘ is either ‘r‘ to
 31            read from an existing archive, ‘a‘ to append data to an existing
 32            file or ‘w‘ to create a new file overwriting an existing one. `mode‘
 33            defaults to ‘r‘.
 34            If `fileobj‘ is given, it is used for reading or writing data. If it
 35            can be determined, `mode‘ is overridden by `fileobj‘s mode.
 36            `fileobj‘ is not closed, when TarFile is closed.
 37         """
 38         modes = {"r": "rb", "a": "r+b", "w": "wb"}
 39         if mode not in modes:
 40             raise ValueError("mode must be ‘r‘, ‘a‘ or ‘w‘")
 41         self.mode = mode
 42         self._mode = modes[mode]
 43
 44         if not fileobj:
 45             if self.mode == "a" and not os.path.exists(name):
 46                 # Create nonexistent files in append mode.
 47                 self.mode = "w"
 48                 self._mode = "wb"
 49             fileobj = bltn_open(name, self._mode)
 50             self._extfileobj = False
 51         else:
 52             if name is None and hasattr(fileobj, "name"):
 53                 name = fileobj.name
 54             if hasattr(fileobj, "mode"):
 55                 self._mode = fileobj.mode
 56             self._extfileobj = True
 57         self.name = os.path.abspath(name) if name else None
 58         self.fileobj = fileobj
 59
 60         # Init attributes.
 61         if format is not None:
 62             self.format = format
 63         if tarinfo is not None:
 64             self.tarinfo = tarinfo
 65         if dereference is not None:
 66             self.dereference = dereference
 67         if ignore_zeros is not None:
 68             self.ignore_zeros = ignore_zeros
 69         if encoding is not None:
 70             self.encoding = encoding
 71
 72         if errors is not None:
 73             self.errors = errors
 74         elif mode == "r":
 75             self.errors = "utf-8"
 76         else:
 77             self.errors = "strict"
 78
 79         if pax_headers is not None and self.format == PAX_FORMAT:
 80             self.pax_headers = pax_headers
 81         else:
 82             self.pax_headers = {}
 83
 84         if debug is not None:
 85             self.debug = debug
 86         if errorlevel is not None:
 87             self.errorlevel = errorlevel
 88
 89         # Init datastructures.
 90         self.closed = False
 91         self.members = []       # list of members as TarInfo objects
 92         self._loaded = False    # flag if all members have been read
 93         self.offset = self.fileobj.tell()
 94                                 # current position in the archive file
 95         self.inodes = {}        # dictionary caching the inodes of
 96                                 # archive members already added
 97
 98         try:
 99             if self.mode == "r":
100                 self.firstmember = None
101                 self.firstmember = self.next()
102
103             if self.mode == "a":
104                 # Move to the end of the archive,
105                 # before the first empty block.
106                 while True:
107                     self.fileobj.seek(self.offset)
108                     try:
109                         tarinfo = self.tarinfo.fromtarfile(self)
110                         self.members.append(tarinfo)
111                     except EOFHeaderError:
112                         self.fileobj.seek(self.offset)
113                         break
114                     except HeaderError, e:
115                         raise ReadError(str(e))
116
117             if self.mode in "aw":
118                 self._loaded = True
119
120                 if self.pax_headers:
121                     buf = self.tarinfo.create_pax_global_header(self.pax_headers.copy())
122                     self.fileobj.write(buf)
123                     self.offset += len(buf)
124         except:
125             if not self._extfileobj:
126                 self.fileobj.close()
127             self.closed = True
128             raise
129
130     def _getposix(self):
131         return self.format == USTAR_FORMAT
132     def _setposix(self, value):
133         import warnings
134         warnings.warn("use the format attribute instead", DeprecationWarning,
135                       2)
136         if value:
137             self.format = USTAR_FORMAT
138         else:
139             self.format = GNU_FORMAT
140     posix = property(_getposix, _setposix)
141
142     #--------------------------------------------------------------------------
143     # Below are the classmethods which act as alternate constructors to the
144     # TarFile class. The open() method is the only one that is needed for
145     # public use; it is the "super"-constructor and is able to select an
146     # adequate "sub"-constructor for a particular compression using the mapping
147     # from OPEN_METH.
148     #
149     # This concept allows one to subclass TarFile without losing the comfort of
150     # the super-constructor. A sub-constructor is registered and made available
151     # by adding it to the mapping in OPEN_METH.
152
153     @classmethod
154     def open(cls, name=None, mode="r", fileobj=None, bufsize=RECORDSIZE, **kwargs):
155         """Open a tar archive for reading, writing or appending. Return
156            an appropriate TarFile class.
157
158            mode:
159            ‘r‘ or ‘r:*‘ open for reading with transparent compression
160            ‘r:‘         open for reading exclusively uncompressed
161            ‘r:gz‘       open for reading with gzip compression
162            ‘r:bz2‘      open for reading with bzip2 compression
163            ‘a‘ or ‘a:‘  open for appending, creating the file if necessary
164            ‘w‘ or ‘w:‘  open for writing without compression
165            ‘w:gz‘       open for writing with gzip compression
166            ‘w:bz2‘      open for writing with bzip2 compression
167
168            ‘r|*‘        open a stream of tar blocks with transparent compression
169            ‘r|‘         open an uncompressed stream of tar blocks for reading
170            ‘r|gz‘       open a gzip compressed stream of tar blocks
171            ‘r|bz2‘      open a bzip2 compressed stream of tar blocks
172            ‘w|‘         open an uncompressed stream for writing
173            ‘w|gz‘       open a gzip compressed stream for writing
174            ‘w|bz2‘      open a bzip2 compressed stream for writing
175         """
176
177         if not name and not fileobj:
178             raise ValueError("nothing to open")
179
180         if mode in ("r", "r:*"):
181             # Find out which *open() is appropriate for opening the file.
182             for comptype in cls.OPEN_METH:
183                 func = getattr(cls, cls.OPEN_METH[comptype])
184                 if fileobj is not None:
185                     saved_pos = fileobj.tell()
186                 try:
187                     return func(name, "r", fileobj, **kwargs)
188                 except (ReadError, CompressionError), e:
189                     if fileobj is not None:
190                         fileobj.seek(saved_pos)
191                     continue
192             raise ReadError("file could not be opened successfully")
193
194         elif ":" in mode:
195             filemode, comptype = mode.split(":", 1)
196             filemode = filemode or "r"
197             comptype = comptype or "tar"
198
199             # Select the *open() function according to
200             # given compression.
201             if comptype in cls.OPEN_METH:
202                 func = getattr(cls, cls.OPEN_METH[comptype])
203             else:
204                 raise CompressionError("unknown compression type %r" % comptype)
205             return func(name, filemode, fileobj, **kwargs)
206
207         elif "|" in mode:
208             filemode, comptype = mode.split("|", 1)
209             filemode = filemode or "r"
210             comptype = comptype or "tar"
211
212             if filemode not in ("r", "w"):
213                 raise ValueError("mode must be ‘r‘ or ‘w‘")
214
215             stream = _Stream(name, filemode, comptype, fileobj, bufsize)
216             try:
217                 t = cls(name, filemode, stream, **kwargs)
218             except:
219                 stream.close()
220                 raise
221             t._extfileobj = False
222             return t
223
224         elif mode in ("a", "w"):
225             return cls.taropen(name, mode, fileobj, **kwargs)
226
227         raise ValueError("undiscernible mode")
228
229     @classmethod
230     def taropen(cls, name, mode="r", fileobj=None, **kwargs):
231         """Open uncompressed tar archive name for reading or writing.
232         """
233         if mode not in ("r", "a", "w"):
234             raise ValueError("mode must be ‘r‘, ‘a‘ or ‘w‘")
235         return cls(name, mode, fileobj, **kwargs)
236
237     @classmethod
238     def gzopen(cls, name, mode="r", fileobj=None, compresslevel=9, **kwargs):
239         """Open gzip compressed tar archive name for reading or writing.
240            Appending is not allowed.
241         """
242         if mode not in ("r", "w"):
243             raise ValueError("mode must be ‘r‘ or ‘w‘")
244
245         try:
246             import gzip
247             gzip.GzipFile
248         except (ImportError, AttributeError):
249             raise CompressionError("gzip module is not available")
250
251         try:
252             fileobj = gzip.GzipFile(name, mode, compresslevel, fileobj)
253         except OSError:
254             if fileobj is not None and mode == ‘r‘:
255                 raise ReadError("not a gzip file")
256             raise
257
258         try:
259             t = cls.taropen(name, mode, fileobj, **kwargs)
260         except IOError:
261             fileobj.close()
262             if mode == ‘r‘:
263                 raise ReadError("not a gzip file")
264             raise
265         except:
266             fileobj.close()
267             raise
268         t._extfileobj = False
269         return t
270
271     @classmethod
272     def bz2open(cls, name, mode="r", fileobj=None, compresslevel=9, **kwargs):
273         """Open bzip2 compressed tar archive name for reading or writing.
274            Appending is not allowed.
275         """
276         if mode not in ("r", "w"):
277             raise ValueError("mode must be ‘r‘ or ‘w‘.")
278
279         try:
280             import bz2
281         except ImportError:
282             raise CompressionError("bz2 module is not available")
283
284         if fileobj is not None:
285             fileobj = _BZ2Proxy(fileobj, mode)
286         else:
287             fileobj = bz2.BZ2File(name, mode, compresslevel=compresslevel)
288
289         try:
290             t = cls.taropen(name, mode, fileobj, **kwargs)
291         except (IOError, EOFError):
292             fileobj.close()
293             if mode == ‘r‘:
294                 raise ReadError("not a bzip2 file")
295             raise
296         except:
297             fileobj.close()
298             raise
299         t._extfileobj = False
300         return t
301
302     # All *open() methods are registered here.
303     OPEN_METH = {
304         "tar": "taropen",   # uncompressed tar
305         "gz":  "gzopen",    # gzip compressed tar
306         "bz2": "bz2open"    # bzip2 compressed tar
307     }
308
309     #--------------------------------------------------------------------------
310     # The public methods which TarFile provides:
311
312     def close(self):
313         """Close the TarFile. In write-mode, two finishing zero blocks are
314            appended to the archive.
315         """
316         if self.closed:
317             return
318
319         if self.mode in "aw":
320             self.fileobj.write(NUL * (BLOCKSIZE * 2))
321             self.offset += (BLOCKSIZE * 2)
322             # fill up the end with zero-blocks
323             # (like option -b20 for tar does)
324             blocks, remainder = divmod(self.offset, RECORDSIZE)
325             if remainder > 0:
326                 self.fileobj.write(NUL * (RECORDSIZE - remainder))
327
328         if not self._extfileobj:
329             self.fileobj.close()
330         self.closed = True
331
332     def getmember(self, name):
333         """Return a TarInfo object for member `name‘. If `name‘ can not be
334            found in the archive, KeyError is raised. If a member occurs more
335            than once in the archive, its last occurrence is assumed to be the
336            most up-to-date version.
337         """
338         tarinfo = self._getmember(name)
339         if tarinfo is None:
340             raise KeyError("filename %r not found" % name)
341         return tarinfo
342
343     def getmembers(self):
344         """Return the members of the archive as a list of TarInfo objects. The
345            list has the same order as the members in the archive.
346         """
347         self._check()
348         if not self._loaded:    # if we want to obtain a list of
349             self._load()        # all members, we first have to
350                                 # scan the whole archive.
351         return self.members
352
353     def getnames(self):
354         """Return the members of the archive as a list of their names. It has
355            the same order as the list returned by getmembers().
356         """
357         return [tarinfo.name for tarinfo in self.getmembers()]
358
359     def gettarinfo(self, name=None, arcname=None, fileobj=None):
360         """Create a TarInfo object for either the file `name‘ or the file
361            object `fileobj‘ (using os.fstat on its file descriptor). You can
362            modify some of the TarInfo‘s attributes before you add it using
363            addfile(). If given, `arcname‘ specifies an alternative name for the
364            file in the archive.
365         """
366         self._check("aw")
367
368         # When fileobj is given, replace name by
369         # fileobj‘s real name.
370         if fileobj is not None:
371             name = fileobj.name
372
373         # Building the name of the member in the archive.
374         # Backward slashes are converted to forward slashes,
375         # Absolute paths are turned to relative paths.
376         if arcname is None:
377             arcname = name
378         drv, arcname = os.path.splitdrive(arcname)
379         arcname = arcname.replace(os.sep, "/")
380         arcname = arcname.lstrip("/")
381
382         # Now, fill the TarInfo object with
383         # information specific for the file.
384         tarinfo = self.tarinfo()
385         tarinfo.tarfile = self
386
387         # Use os.stat or os.lstat, depending on platform
388         # and if symlinks shall be resolved.
389         if fileobj is None:
390             if hasattr(os, "lstat") and not self.dereference:
391                 statres = os.lstat(name)
392             else:
393                 statres = os.stat(name)
394         else:
395             statres = os.fstat(fileobj.fileno())
396         linkname = ""
397
398         stmd = statres.st_mode
399         if stat.S_ISREG(stmd):
400             inode = (statres.st_ino, statres.st_dev)
401             if not self.dereference and statres.st_nlink > 1 and 402                     inode in self.inodes and arcname != self.inodes[inode]:
403                 # Is it a hardlink to an already
404                 # archived file?
405                 type = LNKTYPE
406                 linkname = self.inodes[inode]
407             else:
408                 # The inode is added only if its valid.
409                 # For win32 it is always 0.
410                 type = REGTYPE
411                 if inode[0]:
412                     self.inodes[inode] = arcname
413         elif stat.S_ISDIR(stmd):
414             type = DIRTYPE
415         elif stat.S_ISFIFO(stmd):
416             type = FIFOTYPE
417         elif stat.S_ISLNK(stmd):
418             type = SYMTYPE
419             linkname = os.readlink(name)
420         elif stat.S_ISCHR(stmd):
421             type = CHRTYPE
422         elif stat.S_ISBLK(stmd):
423             type = BLKTYPE
424         else:
425             return None
426
427         # Fill the TarInfo object with all
428         # information we can get.
429         tarinfo.name = arcname
430         tarinfo.mode = stmd
431         tarinfo.uid = statres.st_uid
432         tarinfo.gid = statres.st_gid
433         if type == REGTYPE:
434             tarinfo.size = statres.st_size
435         else:
436             tarinfo.size = 0L
437         tarinfo.mtime = statres.st_mtime
438         tarinfo.type = type
439         tarinfo.linkname = linkname
440         if pwd:
441             try:
442                 tarinfo.uname = pwd.getpwuid(tarinfo.uid)[0]
443             except KeyError:
444                 pass
445         if grp:
446             try:
447                 tarinfo.gname = grp.getgrgid(tarinfo.gid)[0]
448             except KeyError:
449                 pass
450
451         if type in (CHRTYPE, BLKTYPE):
452             if hasattr(os, "major") and hasattr(os, "minor"):
453                 tarinfo.devmajor = os.major(statres.st_rdev)
454                 tarinfo.devminor = os.minor(statres.st_rdev)
455         return tarinfo
456
457     def list(self, verbose=True):
458         """Print a table of contents to sys.stdout. If `verbose‘ is False, only
459            the names of the members are printed. If it is True, an `ls -l‘-like
460            output is produced.
461         """
462         self._check()
463
464         for tarinfo in self:
465             if verbose:
466                 print filemode(tarinfo.mode),
467                 print "%s/%s" % (tarinfo.uname or tarinfo.uid,
468                                  tarinfo.gname or tarinfo.gid),
469                 if tarinfo.ischr() or tarinfo.isblk():
470                     print "%10s" % ("%d,%d" 471                                     % (tarinfo.devmajor, tarinfo.devminor)),
472                 else:
473                     print "%10d" % tarinfo.size,
474                 print "%d-%02d-%02d %02d:%02d:%02d" 475                       % time.localtime(tarinfo.mtime)[:6],
476
477             print tarinfo.name + ("/" if tarinfo.isdir() else ""),
478
479             if verbose:
480                 if tarinfo.issym():
481                     print "->", tarinfo.linkname,
482                 if tarinfo.islnk():
483                     print "link to", tarinfo.linkname,
484             print
485
486     def add(self, name, arcname=None, recursive=True, exclude=None, filter=None):
487         """Add the file `name‘ to the archive. `name‘ may be any type of file
488            (directory, fifo, symbolic link, etc.). If given, `arcname‘
489            specifies an alternative name for the file in the archive.
490            Directories are added recursively by default. This can be avoided by
491            setting `recursive‘ to False. `exclude‘ is a function that should
492            return True for each filename to be excluded. `filter‘ is a function
493            that expects a TarInfo object argument and returns the changed
494            TarInfo object, if it returns None the TarInfo object will be
495            excluded from the archive.
496         """
497         self._check("aw")
498
499         if arcname is None:
500             arcname = name
501
502         # Exclude pathnames.
503         if exclude is not None:
504             import warnings
505             warnings.warn("use the filter argument instead",
506                     DeprecationWarning, 2)
507             if exclude(name):
508                 self._dbg(2, "tarfile: Excluded %r" % name)
509                 return
510
511         # Skip if somebody tries to archive the archive...
512         if self.name is not None and os.path.abspath(name) == self.name:
513             self._dbg(2, "tarfile: Skipped %r" % name)
514             return
515
516         self._dbg(1, name)
517
518         # Create a TarInfo object from the file.
519         tarinfo = self.gettarinfo(name, arcname)
520
521         if tarinfo is None:
522             self._dbg(1, "tarfile: Unsupported type %r" % name)
523             return
524
525         # Change or exclude the TarInfo object.
526         if filter is not None:
527             tarinfo = filter(tarinfo)
528             if tarinfo is None:
529                 self._dbg(2, "tarfile: Excluded %r" % name)
530                 return
531
532         # Append the tar header and data to the archive.
533         if tarinfo.isreg():
534             with bltn_open(name, "rb") as f:
535                 self.addfile(tarinfo, f)
536
537         elif tarinfo.isdir():
538             self.addfile(tarinfo)
539             if recursive:
540                 for f in os.listdir(name):
541                     self.add(os.path.join(name, f), os.path.join(arcname, f),
542                             recursive, exclude, filter)
543
544         else:
545             self.addfile(tarinfo)
546
547     def addfile(self, tarinfo, fileobj=None):
548         """Add the TarInfo object `tarinfo‘ to the archive. If `fileobj‘ is
549            given, tarinfo.size bytes are read from it and added to the archive.
550            You can create TarInfo objects using gettarinfo().
551            On Windows platforms, `fileobj‘ should always be opened with mode
552            ‘rb‘ to avoid irritation about the file size.
553         """
554         self._check("aw")
555
556         tarinfo = copy.copy(tarinfo)
557
558         buf = tarinfo.tobuf(self.format, self.encoding, self.errors)
559         self.fileobj.write(buf)
560         self.offset += len(buf)
561
562         # If there‘s data to follow, append it.
563         if fileobj is not None:
564             copyfileobj(fileobj, self.fileobj, tarinfo.size)
565             blocks, remainder = divmod(tarinfo.size, BLOCKSIZE)
566             if remainder > 0:
567                 self.fileobj.write(NUL * (BLOCKSIZE - remainder))
568                 blocks += 1
569             self.offset += blocks * BLOCKSIZE
570
571         self.members.append(tarinfo)
572
573     def extractall(self, path=".", members=None):
574         """Extract all members from the archive to the current working
575            directory and set owner, modification time and permissions on
576            directories afterwards. `path‘ specifies a different directory
577            to extract to. `members‘ is optional and must be a subset of the
578            list returned by getmembers().
579         """
580         directories = []
581
582         if members is None:
583             members = self
584
585         for tarinfo in members:
586             if tarinfo.isdir():
587                 # Extract directories with a safe mode.
588                 directories.append(tarinfo)
589                 tarinfo = copy.copy(tarinfo)
590                 tarinfo.mode = 0700
591             self.extract(tarinfo, path)
592
593         # Reverse sort directories.
594         directories.sort(key=operator.attrgetter(‘name‘))
595         directories.reverse()
596
597         # Set correct owner, mtime and filemode on directories.
598         for tarinfo in directories:
599             dirpath = os.path.join(path, tarinfo.name)
600             try:
601                 self.chown(tarinfo, dirpath)
602                 self.utime(tarinfo, dirpath)
603                 self.chmod(tarinfo, dirpath)
604             except ExtractError, e:
605                 if self.errorlevel > 1:
606                     raise
607                 else:
608                     self._dbg(1, "tarfile: %s" % e)
609
610     def extract(self, member, path=""):
611         """Extract a member from the archive to the current working directory,
612            using its full name. Its file information is extracted as accurately
613            as possible. `member‘ may be a filename or a TarInfo object. You can
614            specify a different directory using `path‘.
615         """
616         self._check("r")
617
618         if isinstance(member, basestring):
619             tarinfo = self.getmember(member)
620         else:
621             tarinfo = member
622
623         # Prepare the link target for makelink().
624         if tarinfo.islnk():
625             tarinfo._link_target = os.path.join(path, tarinfo.linkname)
626
627         try:
628             self._extract_member(tarinfo, os.path.join(path, tarinfo.name))
629         except EnvironmentError, e:
630             if self.errorlevel > 0:
631                 raise
632             else:
633                 if e.filename is None:
634                     self._dbg(1, "tarfile: %s" % e.strerror)
635                 else:
636                     self._dbg(1, "tarfile: %s %r" % (e.strerror, e.filename))
637         except ExtractError, e:
638             if self.errorlevel > 1:
639                 raise
640             else:
641                 self._dbg(1, "tarfile: %s" % e)
642
643     def extractfile(self, member):
644         """Extract a member from the archive as a file object. `member‘ may be
645            a filename or a TarInfo object. If `member‘ is a regular file, a
646            file-like object is returned. If `member‘ is a link, a file-like
647            object is constructed from the link‘s target. If `member‘ is none of
648            the above, None is returned.
649            The file-like object is read-only and provides the following
650            methods: read(), readline(), readlines(), seek() and tell()
651         """
652         self._check("r")
653
654         if isinstance(member, basestring):
655             tarinfo = self.getmember(member)
656         else:
657             tarinfo = member
658
659         if tarinfo.isreg():
660             return self.fileobject(self, tarinfo)
661
662         elif tarinfo.type not in SUPPORTED_TYPES:
663             # If a member‘s type is unknown, it is treated as a
664             # regular file.
665             return self.fileobject(self, tarinfo)
666
667         elif tarinfo.islnk() or tarinfo.issym():
668             if isinstance(self.fileobj, _Stream):
669                 # A small but ugly workaround for the case that someone tries
670                 # to extract a (sym)link as a file-object from a non-seekable
671                 # stream of tar blocks.
672                 raise StreamError("cannot extract (sym)link as file object")
673             else:
674                 # A (sym)link‘s file object is its target‘s file object.
675                 return self.extractfile(self._find_link_target(tarinfo))
676         else:
677             # If there‘s no data associated with the member (directory, chrdev,
678             # blkdev, etc.), return None instead of a file object.
679             return None
680
681     def _extract_member(self, tarinfo, targetpath):
682         """Extract the TarInfo object tarinfo to a physical
683            file called targetpath.
684         """
685         # Fetch the TarInfo object for the given name
686         # and build the destination pathname, replacing
687         # forward slashes to platform specific separators.
688         targetpath = targetpath.rstrip("/")
689         targetpath = targetpath.replace("/", os.sep)
690
691         # Create all upper directories.
692         upperdirs = os.path.dirname(targetpath)
693         if upperdirs and not os.path.exists(upperdirs):
694             # Create directories that are not part of the archive with
695             # default permissions.
696             os.makedirs(upperdirs)
697
698         if tarinfo.islnk() or tarinfo.issym():
699             self._dbg(1, "%s -> %s" % (tarinfo.name, tarinfo.linkname))
700         else:
701             self._dbg(1, tarinfo.name)
702
703         if tarinfo.isreg():
704             self.makefile(tarinfo, targetpath)
705         elif tarinfo.isdir():
706             self.makedir(tarinfo, targetpath)
707         elif tarinfo.isfifo():
708             self.makefifo(tarinfo, targetpath)
709         elif tarinfo.ischr() or tarinfo.isblk():
710             self.makedev(tarinfo, targetpath)
711         elif tarinfo.islnk() or tarinfo.issym():
712             self.makelink(tarinfo, targetpath)
713         elif tarinfo.type not in SUPPORTED_TYPES:
714             self.makeunknown(tarinfo, targetpath)
715         else:
716             self.makefile(tarinfo, targetpath)
717
718         self.chown(tarinfo, targetpath)
719         if not tarinfo.issym():
720             self.chmod(tarinfo, targetpath)
721             self.utime(tarinfo, targetpath)
722
723     #--------------------------------------------------------------------------
724     # Below are the different file methods. They are called via
725     # _extract_member() when extract() is called. They can be replaced in a
726     # subclass to implement other functionality.
727
728     def makedir(self, tarinfo, targetpath):
729         """Make a directory called targetpath.
730         """
731         try:
732             # Use a safe mode for the directory, the real mode is set
733             # later in _extract_member().
734             os.mkdir(targetpath, 0700)
735         except EnvironmentError, e:
736             if e.errno != errno.EEXIST:
737                 raise
738
739     def makefile(self, tarinfo, targetpath):
740         """Make a file called targetpath.
741         """
742         source = self.extractfile(tarinfo)
743         try:
744             with bltn_open(targetpath, "wb") as target:
745                 copyfileobj(source, target)
746         finally:
747             source.close()
748
749     def makeunknown(self, tarinfo, targetpath):
750         """Make a file from a TarInfo object with an unknown type
751            at targetpath.
752         """
753         self.makefile(tarinfo, targetpath)
754         self._dbg(1, "tarfile: Unknown file type %r, " 755                      "extracted as regular file." % tarinfo.type)
756
757     def makefifo(self, tarinfo, targetpath):
758         """Make a fifo called targetpath.
759         """
760         if hasattr(os, "mkfifo"):
761             os.mkfifo(targetpath)
762         else:
763             raise ExtractError("fifo not supported by system")
764
765     def makedev(self, tarinfo, targetpath):
766         """Make a character or block device called targetpath.
767         """
768         if not hasattr(os, "mknod") or not hasattr(os, "makedev"):
769             raise ExtractError("special devices not supported by system")
770
771         mode = tarinfo.mode
772         if tarinfo.isblk():
773             mode |= stat.S_IFBLK
774         else:
775             mode |= stat.S_IFCHR
776
777         os.mknod(targetpath, mode,
778                  os.makedev(tarinfo.devmajor, tarinfo.devminor))
779
780     def makelink(self, tarinfo, targetpath):
781         """Make a (symbolic) link called targetpath. If it cannot be created
782           (platform limitation), we try to make a copy of the referenced file
783           instead of a link.
784         """
785         if hasattr(os, "symlink") and hasattr(os, "link"):
786             # For systems that support symbolic and hard links.
787             if tarinfo.issym():
788                 if os.path.lexists(targetpath):
789                     os.unlink(targetpath)
790                 os.symlink(tarinfo.linkname, targetpath)
791             else:
792                 # See extract().
793                 if os.path.exists(tarinfo._link_target):
794                     if os.path.lexists(targetpath):
795                         os.unlink(targetpath)
796                     os.link(tarinfo._link_target, targetpath)
797                 else:
798                     self._extract_member(self._find_link_target(tarinfo), targetpath)
799         else:
800             try:
801                 self._extract_member(self._find_link_target(tarinfo), targetpath)
802             except KeyError:
803                 raise ExtractError("unable to resolve link inside archive")
804
805     def chown(self, tarinfo, targetpath):
806         """Set owner of targetpath according to tarinfo.
807         """
808         if pwd and hasattr(os, "geteuid") and os.geteuid() == 0:
809             # We have to be root to do so.
810             try:
811                 g = grp.getgrnam(tarinfo.gname)[2]
812             except KeyError:
813                 g = tarinfo.gid
814             try:
815                 u = pwd.getpwnam(tarinfo.uname)[2]
816             except KeyError:
817                 u = tarinfo.uid
818             try:
819                 if tarinfo.issym() and hasattr(os, "lchown"):
820                     os.lchown(targetpath, u, g)
821                 else:
822                     if sys.platform != "os2emx":
823                         os.chown(targetpath, u, g)
824             except EnvironmentError, e:
825                 raise ExtractError("could not change owner")
826
827     def chmod(self, tarinfo, targetpath):
828         """Set file permissions of targetpath according to tarinfo.
829         """
830         if hasattr(os, ‘chmod‘):
831             try:
832                 os.chmod(targetpath, tarinfo.mode)
833             except EnvironmentError, e:
834                 raise ExtractError("could not change mode")
835
836     def utime(self, tarinfo, targetpath):
837         """Set modification time of targetpath according to tarinfo.
838         """
839         if not hasattr(os, ‘utime‘):
840             return
841         try:
842             os.utime(targetpath, (tarinfo.mtime, tarinfo.mtime))
843         except EnvironmentError, e:
844             raise ExtractError("could not change modification time")
845
846     #--------------------------------------------------------------------------
847     def next(self):
848         """Return the next member of the archive as a TarInfo object, when
849            TarFile is opened for reading. Return None if there is no more
850            available.
851         """
852         self._check("ra")
853         if self.firstmember is not None:
854             m = self.firstmember
855             self.firstmember = None
856             return m
857
858         # Read the next block.
859         self.fileobj.seek(self.offset)
860         tarinfo = None
861         while True:
862             try:
863                 tarinfo = self.tarinfo.fromtarfile(self)
864             except EOFHeaderError, e:
865                 if self.ignore_zeros:
866                     self._dbg(2, "0x%X: %s" % (self.offset, e))
867                     self.offset += BLOCKSIZE
868                     continue
869             except InvalidHeaderError, e:
870                 if self.ignore_zeros:
871                     self._dbg(2, "0x%X: %s" % (self.offset, e))
872                     self.offset += BLOCKSIZE
873                     continue
874                 elif self.offset == 0:
875                     raise ReadError(str(e))
876             except EmptyHeaderError:
877                 if self.offset == 0:
878                     raise ReadError("empty file")
879             except TruncatedHeaderError, e:
880                 if self.offset == 0:
881                     raise ReadError(str(e))
882             except SubsequentHeaderError, e:
883                 raise ReadError(str(e))
884             break
885
886         if tarinfo is not None:
887             self.members.append(tarinfo)
888         else:
889             self._loaded = True
890
891         return tarinfo
892
893     #--------------------------------------------------------------------------
894     # Little helper methods:
895
896     def _getmember(self, name, tarinfo=None, normalize=False):
897         """Find an archive member by name from bottom to top.
898            If tarinfo is given, it is used as the starting point.
899         """
900         # Ensure that all members have been loaded.
901         members = self.getmembers()
902
903         # Limit the member search list up to tarinfo.
904         if tarinfo is not None:
905             members = members[:members.index(tarinfo)]
906
907         if normalize:
908             name = os.path.normpath(name)
909
910         for member in reversed(members):
911             if normalize:
912                 member_name = os.path.normpath(member.name)
913             else:
914                 member_name = member.name
915
916             if name == member_name:
917                 return member
918
919     def _load(self):
920         """Read through the entire archive file and look for readable
921            members.
922         """
923         while True:
924             tarinfo = self.next()
925             if tarinfo is None:
926                 break
927         self._loaded = True
928
929     def _check(self, mode=None):
930         """Check if TarFile is still open, and if the operation‘s mode
931            corresponds to TarFile‘s mode.
932         """
933         if self.closed:
934             raise IOError("%s is closed" % self.__class__.__name__)
935         if mode is not None and self.mode not in mode:
936             raise IOError("bad operation for mode %r" % self.mode)
937
938     def _find_link_target(self, tarinfo):
939         """Find the target member of a symlink or hardlink member in the
940            archive.
941         """
942         if tarinfo.issym():
943             # Always search the entire archive.
944             linkname = "/".join(filter(None, (os.path.dirname(tarinfo.name), tarinfo.linkname)))
945             limit = None
946         else:
947             # Search the archive before the link, because a hard link is
948             # just a reference to an already archived file.
949             linkname = tarinfo.linkname
950             limit = tarinfo
951
952         member = self._getmember(linkname, tarinfo=limit, normalize=True)
953         if member is None:
954             raise KeyError("linkname %r not found" % linkname)
955         return member
956
957     def __iter__(self):
958         """Provide an iterator object.
959         """
960         if self._loaded:
961             return iter(self.members)
962         else:
963             return TarIter(self)
964
965     def _dbg(self, level, msg):
966         """Write debugging output to sys.stderr.
967         """
968         if level <= self.debug:
969             print >> sys.stderr, msg
970
971     def __enter__(self):
972         self._check()
973         return self
974
975     def __exit__(self, type, value, traceback):
976         if type is None:
977             self.close()
978         else:
979             # An exception occurred. We must not call close() because
980             # it would try to write end-of-archive blocks and padding.
981             if not self._extfileobj:
982                 self.fileobj.close()
983             self.closed = True
984 # class TarFile

　　f、shelve

　　shelve模块是一个简单的k,v将内存数据通过文件持久化的模块，可以持久化任何pickle可支持的python数据格式

 1 import shelve
 2 d = shelve.open(‘shelve_test‘) #打开一个文件
 3 class Test(object):
 4     def __init__(self,n):
 5         self.n = n
 6 t = Test(123)
 7 t2 = Test(123334)
 8
 9 name = ["alex","rain","test"]
10 d["test"] = name #持久化列表
11 d["t1"] = t      #持久化类
12 d["t2"] = t2
13 d.close()

　　g、xml处理模块

　　xml是实现不同语言或程序之间进行数据交换的协议，跟json差不多，但json使用起来更简单，不过，古时候，在json还没诞生的黑暗年代，大家只能选择用xml呀，至今很多传统公司如金融行业的很多系统的接口还主要是xml。

　　xml的格式如下，就是通过<>节点来区别数据结构的:

 1 <?xml version="1.0"?>
 2 <data>
 3     <country name="Liechtenstein">
 4         <rank updated="yes">2</rank>
 5         <year>2008</year>
 6         <gdppc>141100</gdppc>
 7         <neighbor name="Austria" direction="E"/>
 8         <neighbor name="Switzerland" direction="W"/>
 9     </country>
10     <country name="Singapore">
11         <rank updated="yes">5</rank>
12         <year>2011</year>
13         <gdppc>59900</gdppc>
14         <neighbor name="Malaysia" direction="N"/>
15     </country>
16     <country name="Panama">
17         <rank updated="yes">69</rank>
18         <year>2011</year>
19         <gdppc>13600</gdppc>
20         <neighbor name="Costa Rica" direction="W"/>
21         <neighbor name="Colombia" direction="E"/>
22     </country>
23 </data>

　　xml协议在各个语言里的都是支持的，在python中可以用以下模块操作xml

 1 import xml.etree.ElementTree as ET
 2 tree = ET.parse("xmltest.xml")
 3 root = tree.getroot()
 4 print(root.tag)
 5 #遍历xml文档
 6 for child in root:
 7     print(child.tag, child.attrib)
 8     for i in child:
 9         print(i.tag,i.text)
10 #只遍历year 节点
11 for node in root.iter(‘year‘):
12     print(node.tag,node.text)

　　修改和删除xml文档内容

 1 import xml.etree.ElementTree as ET
 2 tree = ET.parse("xmltest.xml")
 3 root = tree.getroot()
 4
 5 #修改
 6 for node in root.iter(‘year‘):
 7     new_year = int(node.text) + 1
 8     node.text = str(new_year)
 9     node.set("updated","yes")
10 tree.write("xmltest.xml")
11
12 #删除node
13 for country in root.findall(‘country‘):
14    rank = int(country.find(‘rank‘).text)
15    if rank > 50:
16      root.remove(country)
17 tree.write(‘output.xml‘)

　　自己创建xml文档

 1 import xml.etree.ElementTree as ET
 2 new_xml = ET.Element("namelist")
 3 name = ET.SubElement(new_xml,"name",attrib={"enrolled":"yes"})
 4 age = ET.SubElement(name,"age",attrib={"checked":"no"})
 5 sex = ET.SubElement(name,"sex")
 6 sex.text = ‘33‘
 7 name2 = ET.SubElement(new_xml,"name",attrib={"enrolled":"no"})
 8 age = ET.SubElement(name2,"age")
 9 age.text = ‘19‘
10 et = ET.ElementTree(new_xml) #生成文档对象
11 et.write("test.xml", encoding="utf-8",xml_declaration=True)
12 ET.dump(new_xml) #打印生成的格式

　　h、ConfigParser模块

　　用于生成和修改常见配置文档，当前模块的名称在 python 3.x 版本中变更为 configparser。

　　来看一个好多软件的常见文档格式如下

 1 [DEFAULT]
 2 ServerAliveInterval = 45
 3 Compression = yes
 4 CompressionLevel = 9
 5 ForwardX11 = yes
 6 [bitbucket.org]
 7 User = hg
 8 [topsecret.server.com]
 9 Port = 50022
10 ForwardX11 = no

　　如果想用python生成一个这样的文档怎么做呢？

 1 import configparser
 2 config = configparser.ConfigParser()
 3 config["DEFAULT"] = {‘ServerAliveInterval‘: ‘45‘,
 4                       ‘Compression‘: ‘yes‘,
 5                      ‘CompressionLevel‘: ‘9‘}
 6 config[‘bitbucket.org‘] = {}
 7 config[‘bitbucket.org‘][‘User‘] = ‘hg‘
 8 config[‘topsecret.server.com‘] = {}
 9 topsecret = config[‘topsecret.server.com‘]
10 topsecret[‘Host Port‘] = ‘50022‘     # mutates the parser
11 topsecret[‘ForwardX11‘] = ‘no‘  # same here
12 config[‘DEFAULT‘][‘ForwardX11‘] = ‘yes‘
13 with open(‘example.ini‘, ‘w‘) as configfile:
14    config.write(configfile)

　　写完了还可以再读出来

 1 >>> import configparser
 2 >>> config = configparser.ConfigParser()
 3 >>> config.sections()
 4 []
 5 >>> config.read(‘example.ini‘)
 6 [‘example.ini‘]
 7 >>> config.sections()
 8 [‘bitbucket.org‘, ‘topsecret.server.com‘]
 9 >>> ‘bitbucket.org‘ in config
10 True
11 >>> ‘bytebong.com‘ in config
12 False
13 >>> config[‘bitbucket.org‘][‘User‘]
14 ‘hg‘
15 >>> config[‘DEFAULT‘][‘Compression‘]
16 ‘yes‘
17 >>> topsecret = config[‘topsecret.server.com‘]
18 >>> topsecret[‘ForwardX11‘]
19 ‘no‘
20 >>> topsecret[‘Port‘]
21 ‘50022‘
22 >>> for key in config[‘bitbucket.org‘]: print(key)
23 ...
24 user
25 compressionlevel
26 serveraliveinterval
27 compression
28 forwardx11
29 >>> config[‘bitbucket.org‘][‘ForwardX11‘]
30 ‘yes‘

　　I、hashlib模块　　

　　用于加密相关的操作，3.x里代替了md5模块和sha模块，主要提供 SHA1, SHA224, SHA256, SHA384, SHA512 ，MD5 算法

 1 import hashlib
 2 m = hashlib.md5()
 3 m.update(b"Hello")
 4 m.update(b"It‘s me")
 5 print(m.digest())
 6 m.update(b"It‘s been a long time since last time we ...")
 7
 8 print(m.digest()) #2进制格式hash
 9 print(len(m.hexdigest())) #16进制格式hash
10 ‘‘‘
11 def digest(self, *args, **kwargs): # real signature unknown
12     """ Return the digest value as a string of binary data. """
13     pass
14
15 def hexdigest(self, *args, **kwargs): # real signature unknown
16     """ Return the digest value as a string of hexadecimal digits. """
17     pass
18
19 ‘‘‘
20 import hashlib
21
22 # ######## md5 ########
23
24 hash = hashlib.md5()
25 hash.update(‘admin‘)
26 print(hash.hexdigest())
27
28 # ######## sha1 ########
29
30 hash = hashlib.sha1()
31 hash.update(‘admin‘)
32 print(hash.hexdigest())
33
34 # ######## sha256 ########
35
36 hash = hashlib.sha256()
37 hash.update(‘admin‘)
38 print(hash.hexdigest())
39
40
41 # ######## sha384 ########
42
43 hash = hashlib.sha384()
44 hash.update(‘admin‘)
45 print(hash.hexdigest())
46
47 # ######## sha512 ########
48
49 hash = hashlib.sha512()
50 hash.update(‘admin‘)
51 print(hash.hexdigest())

　　python 还有一个 hmac 模块，它内部对我们创建 key 和内容再进行处理然后再加密

1 import hmac
2 h = hmac.new(‘wueiqi‘)
3 h.update(‘hellowo‘)
4 print h.hexdigest()

　　j、re模块

 1 ‘.‘     默认匹配除\n之外的任意一个字符，若指定flag DOTALL,则匹配任意字符，包括换行
 2 ‘^‘     匹配字符开头，若指定flags MULTILINE,这种也可以匹配上(r"^a","\nabc\neee",flags=re.MULTILINE)
 3 ‘$‘     匹配字符结尾，或e.search("foo$","bfoo\nsdfsf",flags=re.MULTILINE).group()也可以
 4 ‘*‘     匹配*号前的字符0次或多次，re.findall("ab*","cabb3abcbbac")  结果为[‘abb‘, ‘ab‘, ‘a‘]
 5 ‘+‘     匹配前一个字符1次或多次，re.findall("ab+","ab+cd+abb+bba") 结果[‘ab‘, ‘abb‘]
 6 ‘?‘     匹配前一个字符1次或0次
 7 ‘{m}‘   匹配前一个字符m次
 8 ‘{n,m}‘ 匹配前一个字符n到m次，re.findall("ab{1,3}","abb abc abbcbbb") 结果‘abb‘, ‘ab‘, ‘abb‘]
 9 ‘|‘     匹配|左或|右的字符，re.search("abc|ABC","ABCBabcCD").group() 结果‘ABC‘
10 ‘(...)‘ 分组匹配，re.search("(abc){2}a(123|456)c", "abcabca456c").group() 结果 abcabca456c
11
12
13 ‘\A‘    只从字符开头匹配，re.search("\Aabc","alexabc") 是匹配不到的
14 ‘\Z‘    匹配字符结尾，同$
15 ‘\d‘    匹配数字0-9
16 ‘\D‘    匹配非数字
17 ‘\w‘    匹配[A-Za-z0-9]
18 ‘\W‘    匹配非[A-Za-z0-9]
19 ‘s‘     匹配空白字符、\t、\n、\r , re.search("\s+","ab\tc1\n3").group() 结果 ‘\t‘
20
21 ‘(?P<name>...)‘ 分组匹配 re.search("(?P<province>[0-9]{4})(?P<city>[0-9]{2})(?P<birthday>[0-9]{4})","371481199306143242").groupdict("city") 结果{‘province‘: ‘3714‘, ‘city‘: ‘81‘, ‘birthday‘: ‘1993‘}

　　最常用的匹配语法

1 re.match 从头开始匹配
2 re.search 匹配包含
3 re.findall 把所有匹配到的字符放到以列表中的元素返回
4 re.splitall 以匹配到的字符当做列表分隔符
5 re.sub      匹配字符并替换

　　反斜杠的困扰
　　与大多数编程语言相同，正则表达式里使用"\"作为转义字符，这就可能造成反斜杠困扰。假如你需要匹配文本中的字符"\"，那么使用
编程语言表示的正则表达式里将需要4个反斜杠"\\\\"：前两个和后两个分别用于在编程语言里转义成反斜杠，转换成两个反斜杠后再在正则表达式里转义成
一个反斜杠。Python里的原生字符串很好地解决了这个问题，这个例子中的正则表达式可以使用r"\\"表示。同样，匹配一个数字的"\\d"可以写成
r"\d"。有了原生字符串，你再也不用担心是不是漏写了反斜杠，写出来的表达式也更直观。

　　仅需轻轻知道的几个匹配模式

1 re.I(re.IGNORECASE): 忽略大小写（括号内是完整写法，下同）
2 M(MULTILINE): 多行模式，改变‘^‘和‘$‘的行为（参见上图）
3 S(DOTALL): 点任意匹配模式，改变‘.‘的行为

时间： 2024-10-07 23:16:14

【python之旅】python的模块

【python之旅】python的模块的相关文章

Python之旅-Python基础4-数据类型

开始 Python 之旅

分享《父与子的编程之旅python》第2版中英文PDF代码+《趣学Python编程》中英文PDF代码

Python之旅（七）面向对象

python 下的crc16计算模块 XCRC16

python小白-day6 time&datetime模块

python笔记6：常用模块

python基础学习日志day5-各模块文章导航

python基础-常用内建模块