BeautifulSoup库未写明解析器警告

from urllib.request import urlopen from bs4 import BeautifulSoup html = urlopen("http://www.pythonscraping.com/pages/page1.html") bsObj = BeautifulSoup(html.read()) print(bsObj.h1)

代码运行之后警告如下：
UserWarning: No parser was explicitly specified, so I‘m using the best available HTML parser for this system ("lxml"). This usually isn‘t a problem, but if you run this code on another system, or in a different virtual environment, it may use a different parser and behave differently.

The code that caused this warning is on line 4 of the file D:/Python/venv/test8.py. To get rid of this warning, pass the additional argument ‘features="lxml"‘ to the BeautifulSoup constructor.

翻译如下：
用户警告：没有显式指定语法分析器，因此我使用了此系统的最佳可用HTML语法分析器（“lxml”）。这通常不是问题，但是如果您在另一个系统上运行此代码，或者在不同的虚拟环境中运行此代码，它可能会使用不同的解析器并表现出不同的行为。

导致此警告的代码位于文件d:/python/venv/test8.py的第4行。要消除此警告，请将附加参数‘features=“lxml”‘传递给beautifulsoup构造函数。

解决：指定解析器，一般使用‘lxml‘

from urllib.request import urlopen
from bs4 import BeautifulSoup
html = urlopen("http://www.pythonscraping.com/pages/page1.html")
bsObj = BeautifulSoup(html.read(),‘lxml‘)
print(bsObj.h1)

原文地址：http://blog.51cto.com/12884584/2348995

时间： 2024-11-24 10:40:53

BeautifulSoup库未写明解析器警告的相关文章

python标准库configparser配置解析器

1 >>> from configparser import ConfigParser, ExtendedInterpolation 2 >>> parser = ConfigParser(interpolation=ExtendedInterpolation()) 3 >>> # the default BasicInterpolation could be used as well 4 >>> parser.read_string

Beautiful Soup常见的解析器

Beautiful Soup支持Python标准库中的HTML解析器,还支持一些第三方的解析器,如果我们不安装它,则 Python 会使用 Python默认的解析器,lxml 解析器更加强大,速度更快,推荐安装. 解析器使用方法优势劣势 Python标准库 BeautifulSoup(markup, "html.parser") Python的内置标准库执行速度适中文档容错能力强 Python 2.7.3 or 3.2.2)前的版本中文档容错能力差 lxml HTML 解析

python爬虫主要就是五个模块：爬虫启动入口模块，URL管理器存放已经爬虫的URL和待爬虫URL列表，html下载器，html解析器，html输出器同时可以掌握到urllib2的使用、bs4（BeautifulSoup）页面解析器、re正则表达式、urlparse、python基础知识回顾（set集合操作）等相关内容。

本次python爬虫百步百科,里面详细分析了爬虫的步骤,对每一步代码都有详细的注释说明,可通过本案例掌握python爬虫的特点: 1.爬虫调度入口(crawler_main.py) # coding:utf-8from com.wenhy.crawler_baidu_baike import url_manager, html_downloader, html_parser, html_outputer print "爬虫百度百科调度入口" # 创建爬虫类class SpiderMai

BeautifulSoup库未写明解析器警告

BeautifulSoup库未写明解析器警告的相关文章

python标准库configparser配置解析器

Beautiful Soup常见的解析器

因为业务需要，用nodejs写了一个css解析器，因为是基础库，想开源，不知道有没有人需要。

爬虫——BeautifulSoup4解析器

Python爬虫(十四)_BeautifulSoup4 解析器

[Python]HTML/XML解析器Beautiful Soup

python爬虫从入门到放弃（六）之 BeautifulSoup库的使用

optparse 模块—— 命令行选项的解析器