body
{
font-family: "Microsoft YaHei UI","Microsoft YaHei",SimSun,"Segoe UI",Tahoma,Helvetica,Sans-Serif,"Microsoft YaHei", Georgia,Helvetica,Arial,sans-serif,宋体, PMingLiU,serif;
font-size: 10.5pt;
line-height: 1.5;
}
html, body
{
}
h1 {
font-size:1.5em;
font-weight:bold;
}
h2 {
font-size:1.4em;
font-weight:bold;
}
h3 {
font-size:1.3em;
font-weight:bold;
}
h4 {
font-size:1.2em;
font-weight:bold;
}
h5 {
font-size:1.1em;
font-weight:bold;
}
h6 {
font-size:1.0em;
font-weight:bold;
}
img {
border:0;
max-width: 100%;
}
blockquote {
margin-top:0px;
margin-bottom:0px;
}
table {
border-collapse:collapse;
border:1px solid #bbbbbb;
}
td {
border-collapse:collapse;
border:1px solid #bbbbbb;
}
1,远程文件资源读取 response的为 bytes,即utf-8或者gbk,需解码decode为unicode
如:
[python] view plaincopy
- # coding=gbk
- import urllib.request
- import re
- url = 'http://www.163.com'
- file = 'd:/test.html'
- data = urllib.request.urlopen(url).read()
- r1 = re.compile('<.*?>')
- c_t = r1.findall(data)
- print(c_t)
发现读取下来后,运行到第9 行,出现:
can't use a string pattern on a bytes-like object
查找了一下,是说3.0现在的参数更改了,现在读取的是bytes-like的,但参数要求是chart-like的,找了一下,加了个编码:
data = data.decode('GBK')
在与正则使用前,就可以正常使用了..
2.读取本地文本文件open(fname)的为str,即unicode,需编码为encode(utf-8")
如:
[python] view plaincopy
- import os
- fname = 'e:/data/html.txt'
- f = open(fname,'r')
- html = f.read()
- #print(html)
- print (type(html)) #输出为 <class 'str'>
- u = html.encode('utf-8')
- print (type(u))<span style="white-space:pre"> </span>#输出为 <class 'bytes'>
在python3中 <str>型为unicode