Python网络爬虫与信息提取-Beautiful Soup 库入门

一、Beautiful Soup 库的安装

Win平台：“以管理员身份运行” cmd

执行 pip install beautifulsoup4

安装小测：from bs4 import BeautifulSoup

soup=BeautifulSoup(‘data‘,‘html.parser‘)

print(soup.prettify())

二、Beautiful Soup 库的基本元素

1、BeautifulSoup类

from bs4 import BeautifulSoup

soup2=BeautifulSoup(open("D://demo.html"),"html.parser")

soup=BeautifulSoup("<html>data</html>","html.parser")

2、BeautifulSoup类的基本元素 ...

基本元素	说明
Tag	标签，最基本的信息组织单元，分别用<>和</>表明开头和结尾
Name	标签的名字，<p>...</p>的名字是‘p‘，格式：<tag>.name
Attributes	标签的属性，字典形式组织，格式：<tag>.attrs
NavigableString	标签内非属性字符串，<>...</>中字符串，格式：<tag>.string
Comment	标签内字符串的注释部分，一种特殊的Comment类型

回顾demo.html

>>> import requests
>>> r=requests.get("http://python123.io/ws/demo.html")
>>> demo=r.text
>>> demo
‘<html><head><title>This is a python demo page</title></head>\r\n<body>\r\nThe demo python introduces several python courses.\r\nPython is a wonderful general-purpose programming language. You can learn Python from novice to professional by tracking the following courses:\r\n<a href="http://www.icourse163.org/course/BIT-268001" class="py1" id="link1">Basic Python</a> and <a href="http://www.icourse163.org/course/BIT-1001870001" class="py2" id="link2">Advanced Python</a>.\r\n</body></html>‘

Tag标签：任何存在于HTML语法中的标签都可以用soup.<tag>访问获得

当HTML文档中存在多个相同<tag>对应内容时，soup.<tag>返回第一个

>>> from bs4 import BeautifulSoup
>>> soup=BeautifulSoup(demo,"html.parser")
>>> soup.title
<title>This is a python demo page</title>
>>> soup.p
The demo python introduces several python courses.
>>> soup.a
<a class="py1" href="http://www.icourse163.org/course/BIT-268001" id="link1">Basic Python</a>

原文地址：https://www.cnblogs.com/lichaoxiang/p/8204324.html

时间： 2024-11-05 16:32:25

Python网络爬虫与信息提取-Beautiful Soup 库入门

Python网络爬虫与信息提取-Beautiful Soup 库入门的相关文章

2017.08.11 Python网络爬虫实战之Beautiful Soup爬虫

Python网络爬虫与信息提取（中国大学mooc）

Python网络爬虫与信息提取（二）—— BeautifulSoup

python网络爬虫与信息提取【笔记】

python网络爬虫与信息提取——6.Re（正则表达式）库入门

MOOC《Python网络爬虫与信息提取》学习过程笔记【requests库】第一周1-3

Python网络爬虫与信息提取-Requests库网络爬去实战

python网络爬虫（二）requests库的基本介绍和使用

PYTHON网络爬虫与信息提取[正则表达式的使用](单元七)