BeautifulSoup高级应用之 CSS selectors /CSS 选择器

BeautifulSoup支持最常用的CSS selectors，这是将字符串转化为Tag对象或者BeautifulSoup自身的.select()方法。

本篇所使用的html为：

html_doc = """<html><head><title>The Dormouse‘s story</title></head><body><p class="title"><b>The Dormouse‘s story</b></p><p class="story">Once upon a time there were three little sisters; and their names were<a href="http://example.com/elsie" class="sister" id="link1">Elsie</a>,<a href="http://example.com/lacie" class="sister" id="link2">Lacie</a> and<a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>;and they lived at the bottom of a well.</p><p class="story">...</p>"""

举例，你可以这样搜索便签：

soup.select("title")   #使用select函数# [<title>The Dormouse‘s story</title>]soup.select("p nth-of-type(3)")# [<p>...</p>]

另外，你也可以搜索在其他父标签内部的标签，即通过标签的所属关系寻找标签：

soup.select("body a")   #搜索在body标签内部的a标签# [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,#  <a class="sister" href="http://example.com/lacie"  id="link2">Lacie</a>,#  <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]soup.select("html head title")  #搜索在html->head标签内部的标签# [<title>The Dormouse‘s story</title>]

可以直接寻找在其他标签内部的标签：

soup.select("head > title")# [<title>The Dormouse‘s story</title>]soup.select("p > a")# [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,#  <a class="sister" href="http://example.com/lacie"  id="link2">Lacie</a>,#  <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]soup.select("p > a:nth-of-type(2)")# [<a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>]soup.select("p > #link1")# [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>]soup.select("body > a")# []

通过tags标签获得元素的同胞兄弟：

soup.select("#link1 ~ .sister")  #获得id为link1，class为sister的兄弟标签内容（所有的兄弟便签）# [<a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,#  <a class="sister" href="http://example.com/tillie"  id="link3">Tillie</a>]soup.select("#link1 + .sister")   #获得id为link1，class为sister的兄弟标签内容（下一个兄弟便签）# [<a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>]

通过CSS的类获得tags标签:

soup.select(".sister") #获得所有class为sister的标签# [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,#  <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,#  <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]soup.select("[class~=sister]")  #效果同上一个# [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,#  <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,#  <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]

通过id获得标签：

soup.select("#link1") #通过设置参数为id来获取该id对应的tag# [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>]soup.select("a#link2")  #这里区别于上一个单纯的使用id，又增添了tag属性，使查找更加具体# [<a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>]

通过设置select函数的参数为列表，来获取tags。只要匹配列表中的任意一个则就可以捕获。

soup.select(“#link1,#link2”) #捕获id为link1或link2的标签# [<a class=”sister” href=”http://example.com/elsie” id=”link1”>Elsie</a>, # <a class=”sister” href=”http://example.com/lacie” id=”link2”>Lacie</a>]

按照标签是否存在某个属性来获取：

soup.select(‘a[href]‘) #获取a标签中具有href属性的标签# [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,#  <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,#  <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]

通过某个标签的具体某个属性值来查找tags：

soup.select(‘a[href="http://example.com/elsie"]‘)# [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>]soup.select(‘a[href^="http://example.com/"]‘)# [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,#  <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,#  <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]soup.select(‘a[href$="tillie"]‘)# [<a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]soup.select(‘a[href*=".com/el"]‘)# [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>]

这里需要解释一下：
soup.select(‘a[href^=”http://example.com/”]’) 意思是查找href属性值是以”http://example.com/“值为开头的标签，可以查看博客介绍。
soup.select(‘a[href$=”tillie”]’) 意思是查找href属性值是以tillie为结尾的标签。
soup.select(‘a[href*=”.com/el”]’) 意思是查找href属性值中存在字符串”.com/el”的标签，所以只有href=”http://example.com/elsie”一个匹配。

如何查询符合查询条件的第一个标签：

soup.select_one(".sister") #只查询符合条件的第一个tag# <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>

时间： 2024-08-05 04:59:52

BeautifulSoup高级应用之 CSS selectors /CSS 选择器的相关文章

盘点 CSS Selectors Level 4 中新增的选择器

CSS 选择器在实践中是非常常用的,无论是在写样式上或是在 JS 中选择 DOM 元素都需要用到.在 CSS Selectors Level 4 中,工作组继续为选择器标准添加了更丰富的选择器.下面我们来了解一下. :is() :is 是一个用于匹配任意元素的伪类,使用方法很简单,只需要将选择器列表传入即可,也就是说,:is()的结果也就是传入的选择器列表中选中的元素. 那么这个选择器有什么用呢?举个例子:我需要对不同层级下的h1标签设置不同的字体大小: /* Level 0 */ h1 { f

CSS的六大选择器

选择器:选择器是一种模式,用于选择需要添加样式的元素. 首先简述六大选择器基本选择器标签选择器类选择器 ID选择器高级选择器层次选择器结构伪类选择器属性选择器其中基本选择器与层次选择器较为常用. 一.基本选择器 1.标签选择器使用HTML标签来设置标签内的图文样式. 2.类选择器使用class属性定义标签类值,指定某一类属性值来定义其样式. <h1 class="classname"></h1> .classname{font-size:

python51 css重点 1.选择器 2.布局

(2)css的复合选择器与特性

css的复合选择器与特性在本篇学习资料中,将深入了解css的相关概念,上一篇介绍的3种基本选择器的基础上,学习3种由基本选择器复合构成的选择器,然后再介绍css的两个重要的特性. 1.复合选择器复合选择器就是两个或多个基本选择器,通过不同的方式连接成的选择器. 复合选择器的三种类型:交集选择器.并集选择器.后代选择器. (1)交集选择器 “交集”复合选择器由两个选择器直接连接构成:其中第一个必须是标记选择器,第二个必须是类别选择器或者ID选择器:这两个选择器之间不能有空格. 例如:下图声明了

CSS 定位和选择器

CSS 定位 CSS 定位 (Positioning) 属性允许你对元素进行定位. CSS 为定位和浮动提供了一些属性,利用这些属性,可以建立列式布局,将布局的一部分与另一部分重叠,还可以完成多年来通常需要使用多个表格才能完成的任务. 一切皆为框 div.h1 或 p 元素常常被称为块级元素.这意味着这些元素显示为一块内容,即“块框”.与之相反,span 和 strong 等元素称为“行内元素”,这是因为它们的内容显示在行中,即“行内框”. 使用 display 属性改变生成的框的类型.这意味着

css同配选择器

<!doctype html><html lang="en"> <head> <meta charset="UTF-8"> <meta name="Generator" content="EditPlus?"> <meta name="Author" content=""> <meta name="K

CSS基础，选择器

一.css: 是层叠样式,用于美化修饰页面的二.html与css的区别 html作用: 决定了网页的内容和结构 css作用: 美化网页,具体说是美化修饰html标记三.css语法: 选择器{ 属性1:值1: 属性2:值2: } 四.基本选择器: html标记选择器.类选择器.id选择器 html标记选择器: 声明时:选择器名是h

CSS基本功先生——选择器

前边我们已经说过CSS的认识和盒子模型,并且讲到了他的用途,详情大家看这篇文章即可.http://blog.csdn.net/lovemenghaibin/article/details/41148629 导入CSS(导入命名空间) 我们就一点一点的说,首先他是为了修饰网页的,也就是修饰HTML中的元素的,之所以说他方便,主要是他能对HTML中的每个元素集中处理,就好像我们的函数一样,但是我们要使用某个函数的话是不是就首先要导入他的命名空间呢,在HTML中也是一样的,当然方法稍微有点差异,但是思

css之html选择器---shinepans

css1.css: .s1{ /*类选择器*/ color:pink; font-size:30px; /*font-weight 粗体等 font-style normal italic:斜体 oblique倾斜的字体*/ text-decoration:line-through; } .s2{ color:red; font-size:25px; font-style:italic; text-decoration:underline; } .s3{ color:blue; font-siz

BeautifulSoup高级应用 之 CSS selectors /CSS 选择器

BeautifulSoup高级应用 之 CSS selectors /CSS 选择器的相关文章

BeautifulSoup高级应用之 CSS selectors /CSS 选择器

BeautifulSoup高级应用之 CSS selectors /CSS 选择器的相关文章