beautifulsoup 的children和descandants

之前看爬虫的时候,看到这里就断了,一直不太理解这2个的区别。

今天重新看,也借助了这位哥们的方法,把结果打印出来,我大概知道了这2者的区别。

http://www.cnblogs.com/chensimin1990/p/6725803.html

--------------------------------

from urllib.request import urlopen
from bs4 import BeautifulSoup

html = urlopen("http://www.pythonscraping.com/pages/page3.html")
bs_obj = BeautifulSoup(html,‘html.parser‘)

# name_list = bs_obj.find_all("span", {"class":"green"})

#for name in name_list:
#    print(name.get_text())
# file = open(‘test.txt‘,‘w‘)
# content = ‘‘
for child in bs_obj.find("table",{"id":"giftList"}).descendants:
    print(child)

代码是这样的

------------------------------------------------

<tr><th>
Item Title
</th><th>
Description
</th><th>
Cost
</th><th>
Image
</th></tr>
<th>
Item Title
</th>

Item Title

<th>
Description
</th>

Description

<th>
Cost
</th>

Cost

<th>
Image
</th>

Image

<tr class="gift" id="gift1"><td>
Vegetable Basket
</td><td>
This vegetable basket is the perfect gift for your health conscious (or overweight) friends!
<span class="excitingNote">Now with super-colorful bell peppers!</span>
</td><td>
$15.00
</td><td>
<img src="../img/gifts/img1.jpg"/>
</td></tr>
<td>
Vegetable Basket
</td>

Vegetable Basket

<td>
This vegetable basket is the perfect gift for your health conscious (or overweight) friends!
<span class="excitingNote">Now with super-colorful bell peppers!</span>
</td>

This vegetable basket is the perfect gift for your health conscious (or overweight) friends!

<span class="excitingNote">Now with super-colorful bell peppers!</span>
Now with super-colorful bell peppers!

<td>
$15.00
</td>

$15.00

<td>
<img src="../img/gifts/img1.jpg"/>
</td>

<img src="../img/gifts/img1.jpg"/>

<tr class="gift" id="gift2"><td>
Russian Nesting Dolls
</td><td>
Hand-painted by trained monkeys, these exquisite dolls are priceless! And by "priceless," we mean "extremely expensive"! <span class="excitingNote">8 entire dolls per set! Octuple the presents!</span>
</td><td>
$10,000.52
</td><td>
<img src="../img/gifts/img2.jpg"/>
</td></tr>
<td>
Russian Nesting Dolls
</td>

Russian Nesting Dolls

<td>
Hand-painted by trained monkeys, these exquisite dolls are priceless! And by "priceless," we mean "extremely expensive"! <span class="excitingNote">8 entire dolls per set! Octuple the presents!</span>
</td>

Hand-painted by trained monkeys, these exquisite dolls are priceless! And by "priceless," we mean "extremely expensive"!
<span class="excitingNote">8 entire dolls per set! Octuple the presents!</span>
8 entire dolls per set! Octuple the presents!

<td>
$10,000.52
</td>

$10,000.52

<td>
<img src="../img/gifts/img2.jpg"/>
</td>

<img src="../img/gifts/img2.jpg"/>

<tr class="gift" id="gift3"><td>
Fish Painting
</td><td>
If something seems fishy about this painting, it‘s because it‘s a fish! <span class="excitingNote">Also hand-painted by trained monkeys!</span>
</td><td>
$10,005.00
</td><td>
<img src="../img/gifts/img3.jpg"/>
</td></tr>
<td>
Fish Painting
</td>

Fish Painting

<td>
If something seems fishy about this painting, it‘s because it‘s a fish! <span class="excitingNote">Also hand-painted by trained monkeys!</span>
</td>

If something seems fishy about this painting, it‘s because it‘s a fish!
<span class="excitingNote">Also hand-painted by trained monkeys!</span>
Also hand-painted by trained monkeys!

<td>
$10,005.00
</td>

$10,005.00

<td>
<img src="../img/gifts/img3.jpg"/>
</td>

<img src="../img/gifts/img3.jpg"/>

<tr class="gift" id="gift4"><td>
Dead Parrot
</td><td>
This is an ex-parrot! <span class="excitingNote">Or maybe he‘s only resting?</span>
</td><td>
$0.50
</td><td>
<img src="../img/gifts/img4.jpg"/>
</td></tr>
<td>
Dead Parrot
</td>

Dead Parrot

<td>
This is an ex-parrot! <span class="excitingNote">Or maybe he‘s only resting?</span>
</td>

This is an ex-parrot!
<span class="excitingNote">Or maybe he‘s only resting?</span>
Or maybe he‘s only resting?

<td>
$0.50
</td>

$0.50

<td>
<img src="../img/gifts/img4.jpg"/>
</td>

<img src="../img/gifts/img4.jpg"/>

<tr class="gift" id="gift5"><td>
Mystery Box
</td><td>
If you love suprises, this mystery box is for you! Do not place on light-colored surfaces. May cause oil staining. <span class="excitingNote">Keep your friends guessing!</span>
</td><td>
$1.50
</td><td>
<img src="../img/gifts/img6.jpg"/>
</td></tr>
<td>
Mystery Box
</td>

Mystery Box

<td>
If you love suprises, this mystery box is for you! Do not place on light-colored surfaces. May cause oil staining. <span class="excitingNote">Keep your friends guessing!</span>
</td>

If you love suprises, this mystery box is for you! Do not place on light-colored surfaces. May cause oil staining.
<span class="excitingNote">Keep your friends guessing!</span>
Keep your friends guessing!

<td>
$1.50
</td>

$1.50

<td>
<img src="../img/gifts/img6.jpg"/>
</td>

<img src="../img/gifts/img6.jpg"/>

这是结果

用children的函数(?不知道为什么叫函数,感觉没有括号,明明是字段啊...)

from urllib.request import urlopen
from bs4 import BeautifulSoup

html = urlopen("http://www.pythonscraping.com/pages/page3.html")
bs_obj = BeautifulSoup(html,‘html.parser‘)

# name_list = bs_obj.find_all("span", {"class":"green"})

#for name in name_list:
#    print(name.get_text())
# file = open(‘test.txt‘,‘w‘)
# content = ‘‘
for child in bs_obj.find("table",{"id":"giftList"}).children:
    print(child)

结果是这样的:

<tr><th>
Item Title
</th><th>
Description
</th><th>
Cost
</th><th>
Image
</th></tr>

<tr class="gift" id="gift1"><td>
Vegetable Basket
</td><td>
This vegetable basket is the perfect gift for your health conscious (or overweight) friends!
<span class="excitingNote">Now with super-colorful bell peppers!</span>
</td><td>
$15.00
</td><td>
<img src="../img/gifts/img1.jpg"/>
</td></tr>

<tr class="gift" id="gift2"><td>
Russian Nesting Dolls
</td><td>
Hand-painted by trained monkeys, these exquisite dolls are priceless! And by "priceless," we mean "extremely expensive"! <span class="excitingNote">8 entire dolls per set! Octuple the presents!</span>
</td><td>
$10,000.52
</td><td>
<img src="../img/gifts/img2.jpg"/>
</td></tr>

<tr class="gift" id="gift3"><td>
Fish Painting
</td><td>
If something seems fishy about this painting, it‘s because it‘s a fish! <span class="excitingNote">Also hand-painted by trained monkeys!</span>
</td><td>
$10,005.00
</td><td>
<img src="../img/gifts/img3.jpg"/>
</td></tr>

<tr class="gift" id="gift4"><td>
Dead Parrot
</td><td>
This is an ex-parrot! <span class="excitingNote">Or maybe he‘s only resting?</span>
</td><td>
$0.50
</td><td>
<img src="../img/gifts/img4.jpg"/>
</td></tr>

<tr class="gift" id="gift5"><td>
Mystery Box
</td><td>
If you love suprises, this mystery box is for you! Do not place on light-colored surfaces. May cause oil staining. <span class="excitingNote">Keep your friends guessing!</span>
</td><td>
$1.50
</td><td>
<img src="../img/gifts/img6.jpg"/>
</td></tr>

------------------

到说明的时候:

1、chidlren并不是只返回子代的第一层,而是到没有子代的那一层,也就是说会穿透所有的,这个我以前以为是descendants干的事。

2、那descendants还留着干嘛呢?

是这么一个作用,他对每一个子代都会遍历一边他所有的后代。

如果我们打个比方:

a

-a1

--a11

--a12

--a13

---a131

----a1311

如果用children,其实就是原样返回,如果用descendants的话,他会在a13的时候返回一次a1311,a131的时候又返回一次a1311。

另外

for child in bs_obj.find("table",{"id":"giftList"}).children 和

for child in bs_obj.find("table",{"id":"giftList"})是等价的,想想也知道,这个更符合一般人的直觉。

时间: 2024-08-27 06:45:58

beautifulsoup 的children和descandants的相关文章

BeautifulSoup的高级应用 之 contents children descendants string strings stripped_strings

继上一节,BeautifulSoup的高级应用 之 find findAll,这一节,主要讲解BeautifulSoup有关的其他几个重要应用函数. 本篇中,所使用的html为: html_doc = """ <html> <head><title>The Dormouse's story</title></head> <p class="title"><b>The Dor

python爬虫从入门到放弃(六)之 BeautifulSoup库的使用

上一篇文章的正则,其实对很多人来说用起来是不方便的,加上需要记很多规则,所以用起来不是特别熟练,而这节我们提到的beautifulsoup就是一个非常强大的工具,爬虫利器. beautifulSoup “美味的汤,绿色的浓汤” 一个灵活又方便的网页解析库,处理高效,支持多种解析器.利用它就不用编写正则表达式也能方便的实现网页信息的抓取 快速使用 通过下面的一个例子,对bs4有个简单的了解,以及看一下它的强大之处: from bs4 import BeautifulSoup html = '''

BeautifulSoup的选择器

用BeautifulSoup查找指定标签(元素)的时候,有几种方法: soup=BeautifulSoup(html) 1.soup.find_all(tagName),返回一个指定Tag元素的列表 2.soup.select(selector),返回一个指定Tag元素的列表,是非常好用的方法,它支持大部分css选择器(可在链接页面内查找"CSS选择器"相关章节),如类选择器,id选择器,子代选择器(但不支持直接子代选择器) 例如可以这样写,soup.select('.listCone

BeautifulSoup 的用法

转自:http://cuiqingcai.com/1319.html Beautiful Soup支持Python标准库中的HTML解析器,还支持一些第三方的解析器,如果我们不安装它,则 Python 会使用 Python默认的解析器,lxml 解析器更加强大,速度更快,推荐安装. <thead”> 解析器 使用方法 优势 劣势 Python标准库 BeautifulSoup(markup, “html.parser”) Python的内置标准库 执行速度适中 文档容错能力强 Python 2

BeautifulSoup基础

MarkdownPad Document BeautifulSoup findAll函数 nameList = bsObj.findAll("span", {"class":"green"}) for name in namelist:     print(name.get_text()) #找到所有属性class="green"的span标签,通常在你准备打印.存储和操作数据时,应该最后才使 用 .get_text() .一

Python 爬虫-BeautifulSoup

2017-07-26 10:10:11 Beautiful Soup可以解析html 和 xml 格式的文件. Beautiful Soup库是解析.遍历.维护"标签树"的功能库.使用BeautifulSoup库非常简单,只需要两行代码,就可以完成BeautifulSoup类的创建,这里命名为soup,接下来就可以对soup进行相关处理了.一个BeautifulSoup类对应html或者xml的全部内容. BeautifulSoup库将任意html文件转换成utf-8格式 一.解析器

[爬虫] BeautifulSoup库

Beautiful Soup库基础知识 Beautiful Soup库是解析xml和html的功能库.html.xml大都是一对一对的标签构成,所以Beautiful Soup库是解析.遍历.维护"标签树"的功能库,只要提供的是标签类型Beautiful Soup库都可以进行很好的解析. Beauti Soup库的导入 from bs4 import BeautifulSoup import bs4 html文档 == 标签树 == BeautifulSoup类   可以认为三者是等价

【python】--BeautifulSoup

背景 简单来说,Beautiful Soup是python的一个库,最主要的功能是从网页抓取数据.官方解释如下: Beautiful Soup提供一些简单的.python式的函数用来处理导航.搜索.修改分析树等功能.它是一个工具箱,通过解析文档为用户提供需要抓取的数据,因为简单,所以不需要多少代码就可以写出一个完整的应用程序. Beautiful Soup自动将输入文档转换为Unicode编码,输出文档转换为utf-8编码.你不需要考虑编码方式,除非文档没有指定一个编码方式,这时,Beautif

爬虫必备—BeautifulSoup

BeautifulSoup是一个模块,该模块用于接收一个HTML或XML字符串,然后将其进行格式化,之后便可以使用他提供的方法进行快速查找指定元素,从而使得在HTML或XML中查找指定元素变得简单. 1 from bs4 import BeautifulSoup 2 3 html_doc = """ 4 <html><head><title>The Dormouse's story</title></head> 5