以下代码,在执行结果中的中文出现乱码。
from bs4 import BeautifulSoup import urllib2 request = urllib2.Request(‘http://www.163.com‘) response = urllib2.urlopen(request) html_doc = response.read() soup = BeautifulSoup(html_doc) print soup.find_all(‘a‘)
因为中文页面编码是gb2312,gbk,在BeautifulSoup构造器中传入from_encoding = "gb18030"参数可解决乱码问题。
注:在BeautifulSoup3中,from_encoding需修改为fromEncoding。
from bs4 import BeautifulSoup import urllib2 request = urllib2.Request(‘http://www.163.com‘) response = urllib2.urlopen(request) html_doc = response.read() soup = BeautifulSoup(html_doc, from_encoding = "gb18030") print soup.find_all(‘a‘)
时间: 2024-11-07 22:40:17