pyquery 库的方法

初始化

在这里介绍四种初始化方式。

（1）直接字符串

from pyquery import PyQuery as pq
doc = pq("<html></html>")

pq 参数可以直接传入 HTML 代码，doc 现在就相当于 jQuery 里面的 $ 符号了。

（2）`lxml.etree`

from lxml import etree
doc = pq(etree.fromstring("<html></html>"))

可以首先用 lxml 的 etree 处理一下代码，这样如果你的 HTML 代码出现一些不完整或者疏漏，都会自动转化为完整清晰结构的 HTML代码。

（3）直接传`URL`

from pyquery import PyQuery as pq
doc = pq(‘http://www.baidu.com‘)

这里就像直接请求了一个网页一样，类似用 urllib2 来直接请求这个链接，得到 HTML 代码。

（4）传文件

from pyquery import PyQuery as pq
doc = pq(filename=‘hello.html‘)

可以直接传某个路径的文件名。

快速体验

现在我们以本地文件为例，传入一个名字为 hello.html 的文件，文件内容为

<div>
    <ul>
         <li class="item-0">first item</li>
         <li class="item-1"><a href="link2.html">second item</a></li>
         <li class="item-0 active"><a href="link3.html"><span class="bold">third item</span></a></li>
         <li class="item-1 active"><a href="link4.html">fourth item</a></li>
         <li class="item-0"><a href="link5.html">fifth item</a></li>
     </ul>
 </div>

编写如下程序

from pyquery import PyQuery as pq
doc = pq(filename=‘hello.html‘)
print doc.html()
print type(doc)
li = doc(‘li‘)
print type(li)
print li.text()

运行结果

    <ul>
         <li class="item-0">first item</li>
         <li class="item-1"><a href="link2.html">second item</a></li>
         <li class="item-0 active"><a href="link3.html"><span class="bold">third item</span></a></li>
         <li class="item-1 active"><a href="link4.html">fourth item</a></li>
         <li class="item-0"><a href="link5.html">fifth item</a></li>
     </ul>

<class ‘pyquery.pyquery.PyQuery‘>
<class ‘pyquery.pyquery.PyQuery‘>
first item second item third item fourth item fifth item

看，回忆一下 jQuery 的语法，是不是运行结果都是一样的呢？

在这里我们注意到了一点，PyQuery 初始化之后，返回类型是 PyQuery，利用了选择器筛选一次之后，返回结果的类型依然还是 PyQuery，这简直和 jQuery 如出一辙，不能更赞！然而想一下 BeautifulSoup 和 XPath 返回的是什么？列表！一种不能再进行二次筛选（在这里指依然利用 BeautifulSoup 或者 XPath 语法）的对象！

然而比比 PyQuery，哦我简直太爱它了！

属性操作

你可以完全按照 jQuery 的语法来进行 PyQuery 的操作。

from pyquery import PyQuery as pq

p = pq(‘<p id="hello" class="hello"></p>‘)(‘p‘)
print p.attr("id")
print p.attr("id", "plop")
print p.attr("id", "hello")

运行结果

hello
<p id="plop" class="hello"/>
<p id="hello" class="hello"/>

再来一发

from pyquery import PyQuery as pq

p = pq(‘<p id="hello" class="hello"></p>‘)(‘p‘)
print p.addClass(‘beauty‘)
print p.removeClass(‘hello‘)
print p.css(‘font-size‘, ‘16px‘)
print p.css({‘background-color‘: ‘yellow‘})

运行结果

<p id="hello" class="hello beauty"/>
<p id="hello" class="beauty"/>
<p id="hello" class="beauty" style="font-size: 16px"/>
<p id="hello" class="beauty" style="font-size: 16px; background-color: yellow"/>

依旧是那么优雅与自信！

在这里我们发现了，这是一连串的操作，而 p 是一直在原来的结果上变化的。

因此执行上述操作之后，p 本身也发生了变化。

DOM操作

同样的原汁原味的 jQuery 语法

from pyquery import PyQuery as pq

p = pq(‘<p id="hello" class="hello"></p>‘)(‘p‘)
print p.append(‘ check out <a href="http://reddit.com/r/python"><span>reddit</span></a>‘)
print p.prepend(‘Oh yes!‘)
d = pq(‘<div class="wrap"><div id="test"><a href="http://cuiqingcai.com">Germy</a></div></div>‘)
p.prependTo(d(‘#test‘))
print p
print d
d.empty()
print d

运行结果

<p id="hello" class="hello"> check out <a href="http://reddit.com/r/python"><span>reddit</span></a></p>
<p id="hello" class="hello">Oh yes! check out <a href="http://reddit.com/r/python"><span>reddit</span></a></p>
<p id="hello" class="hello">Oh yes! check out <a href="http://reddit.com/r/python"><span>reddit</span></a></p>
<div class="wrap"><div id="test"><p id="hello" class="hello">Oh yes! check out <a href="http://reddit.com/r/python"><span>reddit</span></a></p><a href="http://cuiqingcai.com">Germy</a></div></div>
<div class="wrap"/>

这不需要多解释了吧。

DOM 操作也是与 jQuery 如出一辙。

遍历

遍历用到 items 方法返回对象列表，或者用 lambda

from pyquery import PyQuery as pq
doc = pq(filename=‘hello.html‘)
lis = doc(‘li‘)
for li in lis.items():
    print li.html()

print lis.each(lambda e: e)

运行结果

first item
<a href="link2.html">second item</a>
<a href="link3.html"><span class="bold">third item</span></a>
<a href="link4.html">fourth item</a>
<a href="link5.html">fifth item</a>
<li class="item-0">first item</li>
 <li class="item-1"><a href="link2.html">second item</a></li>
 <li class="item-0 active"><a href="link3.html"><span class="bold">third item</span></a></li>
 <li class="item-1 active"><a href="link4.html">fourth item</a></li>
 <li class="item-0"><a href="link5.html">fifth item</a></li>

不过最常用的还是 items 方法

网页请求

PyQuery 本身还有网页请求功能，而且会把请求下来的网页代码转为 PyQuery 对象。

from pyquery import PyQuery as pq
print pq(‘http://cuiqingcai.com/‘, headers={‘user-agent‘: ‘pyquery‘})
print pq(‘http://httpbin.org/post‘, {‘foo‘: ‘bar‘}, method=‘post‘, verify=True)

感受一下，GET，POST，样样通。

时间： 2024-11-08 00:43:26

pyquery 库的方法的相关文章

python爬虫从入门到放弃（七）之 PyQuery库的使用

原文地址https://www.cnblogs.com/zhaof/p/6935473.html PyQuery库也是一个非常强大又灵活的网页解析库,如果你有前端开发经验的,都应该接触过jQuery,那么PyQuery就是你非常绝佳的选择,PyQuery 是 Python 仿照 jQuery 的严格实现.语法与 jQuery 几乎完全相同,所以不用再去费心去记一些奇怪的方法了. 官网地址:http://pyquery.readthedocs.io/en/latest/jQuery参考文档: ht

python之爬虫（九）PyQuery库的使用

PyQuery库也是一个非常强大又灵活的网页解析库,如果你有前端开发经验的,都应该接触过jQuery,那么PyQuery就是你非常绝佳的选择,PyQuery 是 Python 仿照 jQuery 的严格实现.语法与 jQuery 几乎完全相同,所以不用再去费心去记一些奇怪的方法了. 官网地址:http://pyquery.readthedocs.io/en/latest/jQuery参考文档: http://jquery.cuishifeng.cn/ 初始化初始化的时候一般有三种传入方式:传入

Linux下非root权限安装与使用GDAL库的方法

学习GDAL的话推荐两个网站. GDAL的官方文档:www.gdal.org 李民录老师的博客:http://blog.csdn.net/liminlu0314/article/category/777646 下面进入正题. 笔者的系统为RHEL4. 建议Linux的使用者习惯非root权限的操作,这是一个好习惯,在工作中会很有帮助. 首先安装GDAL依赖库PROJ.4和GEOS. PROJ.4是提供投影坐标系相关操作的库,GEOS是提供空间分析计算相关的库.都是开源的项目,可以自行Google

vc6.0里使用lib(静态库)的方法

vc6.0 中使用lib文件使用库的方法如下: 1. 包含库的头文件(把库的头文件包含到项目中) 在应用程序工程中使用 #include "file path" file path可以为绝对路径,也可以为相对于工程所在目录的相对路径如果头文件比较多,可以在project>settings>c/c++>preprocessor的Additional include directories中填入你的头文件所在目录 2. 导入lib库.导入的方法很多方法1) 直接用p

cmake同时生成动态库与静态库的方法

我的目录结构 [[email protected] createLibrary]$ tree . ├── bin ├── build ├── CMakeLists.txt ├── include │ └── person.h ├── lib └── src ├── CMakeLists.txt ├── main │ ├── CMakeLists.txt │ └── main.cpp └── person ├── CMakeLists.txt └── person.cpp 7 dire

NMath Stats 统计计算和生物统计学算法库使用方法及下载地址

NMath Stats提供了统计计算和生物统计学领域的处理功能,包括描述统计.概率分布.组合功能.多重线型回归.假设检验.方差分析计算和多元统计. 具体功能: 提供了一个数据架构类来保证多种不同的数据类型(数值型.字符串型.时间数据型和通配符型),并提供了多种操作方法设置.插入.移除.排序和改变行和列提供了描述统计的功能,包括求平均值.求方差.求标准差.求百分率.求中值.求四分点值.求几何平均数.求调和平均值.求均方根值.求峰值.求偏斜度等提供了专门的处理功能,比如阶乘.对数阶乘.二项式系数

Android中集成第三方库的方法和问题

声明: 1. 本文参考了网上同学们的现有成果,在此表示感谢,参考资料在文后有链接. 2. 本文的重点在第三部分,是在开发中遇到的问题及解决办法.第一,第二部分为参考网上同学们的成果加以整理而成. 3. 欢迎转载,交流,请尊重作者劳动成果:转载请注明出处,谢谢! Android中使用第三方库可能有两种:Java库.jar和原生库.so/.a,如果只是做上层APK开发,这两种库都可以通过Eclipse集成开发环境进行集成,如果是平台级开发,则都可通过源码集成.本文整理这两种集成方式使用步骤,以下假设

不使用第三方组件，只调用COM，导出EXECL，但只装OFFICE2007，不装2003，直接强制引用动态库的方法

步骤1:强制引用动态库”Microsoft.Office.Interop.Excel“,版本为11.0: 步骤2:添加引用OFFICE2007的COM组件”MicroSoft Office 12.0 Object Library“(OFFICE2003的是MicroSoft Office 11.0 Object Library),添加完成后动态库为”microsoft.office.core“. 不使用第三方组件,只调用COM,导出EXECL,但只装OFFICE2007,不装2003,直接强制引

复制工程或修改工程名字后找不到第三方库解决方法

问题: couldn't found -lwexin 等与第三方库有关的关键词报错,如上图. 解决方法:在工程文件中删掉第三方库的引用,然后再重新添加进来.command+B,OK. 复制工程或修改工程名字后找不到第三方库解决方法

pyquery 库的方法

初始化

（1）直接字符串

（2）lxml.etree

（3）直接传URL

（4）传文件

快速体验

属性操作

DOM操作

遍历

网页请求

pyquery 库的方法的相关文章

（2）`lxml.etree`

（3）直接传`URL`