beautifulSoup(1)

import re
from bs4 import BeautifulSoup
doc = [‘<html><head><title>Page title</title></head>‘,
       ‘<body><p id="firstpara" align="center">This is paragraph <b>one</b>.‘,
       ‘<p id="secondpara" align="blah">This is paragraph <b>two</b>.‘,
       ‘</html>‘]　　
soup = BeautifulSoup(‘‘.join(doc))
print(soup.prettify())
title=soup.html.head.title
print(title)
print(title.string)
print(len(soup(‘p‘)))
print(soup.findAll(‘p‘,align=‘center‘))
print(soup.find(‘p‘,align=‘center‘))
print(soup(‘p‘,align=‘center‘)[0][‘id‘])
print(soup.find(‘p‘,align=re.compile(‘^b.*‘))[‘id‘])
print(soup.find(‘p‘).b.string)
print(soup(‘p‘)[1].b.string)
-----------------------------------------------------------------------------------

<html>
<head>
<title>
   Page title
</title>
</head>
<body>
<p align="center" id="firstpara">
   This is paragraph
   <b>
    one
   </b>
   .
   <p align="blah" id="secondpara">
    This is paragraph
    <b>
     two
    </b>
    .
   </p>
</p>
</body>
</html>
<title>Page title</title>
Page title
2
[<p align="center" id="firstpara">This is paragraph <b>one</b>.<p align="blah" id="secondpara">This is paragraph <b>two</b>.</p></p>]
<p align="center" id="firstpara">This is paragraph <b>one</b>.<p align="blah" id="secondpara">This is paragraph <b>two</b>.</p></p>
firstpara
secondpara
one
two
[Finished in 0.5s]

时间： 2024-10-29 00:02:13

beautifulSoup(1)的相关文章

Scrapy+BeautifulSoup+MongoDB 高性能数据采集方案（Chapter 1st）

运行环境 CentOS7.3 + Python2.7 + Scrapy1.3 + MongoDB3.4 + BeautifulSoup4.6 编程工具 PyCharm + Robomongo + Xshell 请确保你的 python版本为2.7.5以上版本强烈推荐直接[翻墙安装],简单轻松 yum install gcc libffi-devel python-devel openssl-devel pip install scrapy 如果提示以下错误 AttributeError:

Python BeautifulSoup的使用

2017-07-24 22:39:14 Python3 中的beautifulsoup引入的包是bs4 import requests from bs4 import * r = requests.get('http://jwc.seu.edu.cn/') soup = BeautifulSoup(r.text,'html.parser') #prettify()函数可以将html以易读的形式展现出来 print(soup.prettify()) #find_all(tag) 返回所有的tag,

python：BeautifulSoup学习

上一篇说到用BeautifulSoup解析源代码,下面我们就来实战一下: 1 from bs4 import BeautifulSoup 2 html = urllib.request.urlopen('http://www.massey.ac.nz/massey/learning/programme-course/programme.cfm?prog_id=93536') 3 html = html.read().decode('utf-8') 4 soup = BeautifulSoup(h

python爬虫从入门到放弃（六）之 BeautifulSoup库的使用

上一篇文章的正则,其实对很多人来说用起来是不方便的,加上需要记很多规则,所以用起来不是特别熟练,而这节我们提到的beautifulsoup就是一个非常强大的工具,爬虫利器. beautifulSoup “美味的汤,绿色的浓汤” 一个灵活又方便的网页解析库,处理高效,支持多种解析器.利用它就不用编写正则表达式也能方便的实现网页信息的抓取快速使用通过下面的一个例子,对bs4有个简单的了解,以及看一下它的强大之处: from bs4 import BeautifulSoup html = '''

python爬虫---beautifulsoup（2）

之前我们使用的是python的自带的解析器html.parser.官网上面还有一些其余的解析器,我们分别学习一下. 解析器使用方法优点缺点 htm.parser BeautifulSoup(markup,'html.parser') 1.python自带的 2.解析速度过得去 3.容错强 2.7之前的版本,和3.3之前不包括2.7的都不支持 lxml`s HTML parser BeautifulSoup(markup,'lxml') 1.非常快 2.容错强要安装C语言库 lxml`s

BeautifulSoup学习之结构

Beautiful Soup将复杂HTML文档转换成一个复杂的树形结构,每个节点都是Python对象,所有对象可以归纳为4种: Tag NavigableString BeautifulSoup Comment ()1).Tag 就是html中的标签,如图所示代码: html '<title>The Dormouse\'s story</title> <a class="sister" href="http://example.com/elsi

python3用BeautifulSoup用字典的方法抓取a标签内的数据

# -*- coding:utf-8 -*- #python 2.7 #XiaoDeng #http://tieba.baidu.com/p/2460150866 #标签操作 from bs4 import BeautifulSoup import urllib.request import re #如果是网址,可以用这个办法来读取网页 #html_doc = "http://tieba.baidu.com/p/2460150866" #req = urllib.request.Req

python3用BeautifulSoup抓取a标签

# -*- coding:utf-8 -*- #python 2.7 #XiaoDeng #http://tieba.baidu.com/p/2460150866 from bs4 import BeautifulSoup import urllib.request html_doc = "http://tieba.baidu.com/p/2460150866" req = urllib.request.Request(html_doc) webpage = urllib.reques

python3用BeautifulSoup抓取id='xiaodeng',且正则包含‘elsie’的标签

# -*- coding:utf-8 -*- #python 2.7 #XiaoDeng #http://tieba.baidu.com/p/2460150866 #使用多个指定名字的参数可以同时过滤tag的多个属性 from bs4 import BeautifulSoup import urllib.request import re #如果是网址,可以用这个办法来读取网页 #html_doc = "http://tieba.baidu.com/p/2460150866" #req

猜你喜欢

tomcat启动过程报the JDBC Driver has been forcibly unregistered问题的修复过程

最近两天在整理关于flume的总结文档,没有启动过tomcat.昨天晚上部署启动,发现报了如题的错误,全文如下: 严重: The web application [/oa-deploy] regist ...

访问权限修饰符-static-final-this-super-匿名对象

1.this关键字的作用 1)调用本类中的属性; 2)调用本类中的构造方法;且只能放首行,且必须留一个构造方法作为出口,即不能递归调用 3)表示当前对象; 2.匿名对象 ...

利用STM32CubeMX来生成USB_HID_Mouse工程【添加ADC】（2）【非dma和中断方式】

上回讲到怎么采集一路的adc的数据,这次我们来采集两路的数据. 现在直接修改原先的代码 /* Private variables ----------------------------------- ...

部分和问题（贪心算法--递归）

#include<stdio.h> #define N 20 int a[N]; int n,k; int dfs(int i, int sum); int main() { int i; ...

Dialog与FragmentDialog源码解析

<代码里的世界> -UI篇用文字札记描绘自己 android学习之路转载请保留出处 by Qiao http://blog.csdn.net/qiaoidea/article/deta ...

用UISliader和仿射变换-实现等比例缩放

今天在网上搜了一下没找到用slider实现等比例缩放的例子今天自己做了一个大家看下.图片放大缩小,slider往左滑动缩小,slider往右滑动放大 #import "ViewContro ...

word如何插入目录

word如何插入目录百度经验:jingyan.baidu.com 在写文章的时候我们需要插入目录,如果自己手动添加目录会非常麻烦,以后修改文章的时候还得修改目录的页码,还好Word中有自动添加目录的 ...

重新编译LibRaw

1. 将LibRaw-demosaic-pack-GPL2-0.16.0和LibRaw-demosaic-pack-GPL3-0.16.0解压后,放入LibRaw-0.16.0中.2. CMake,选 ...

不可小看的移动广告聚合平台-KeyMob

随着时代的迅速发展,微信,QQ已经是社交即时通讯中活跃率最高的产品,也是促进移动互联网社交广告平台发展的重要资源.KeyMob洞察了移动社交广告发展前景,成立了移动聚合广告平台-KeyMob. Key ...

Python报错UnicodeDecodeError: ascii codec can t decode byte 0xe0 ...解决方法

用命令(python setup.py install)安装webpy时候总是报错在网上搜索到的解决方法如下: 1. 这是Python 2 mimetypes的bug 2. 需要将Python2.7 ...

说说JavaScriptCore

http://www.jianshu.com/p/1328e15416f3/comments/1724404 javascript目前看来仍是世界上最流行的语言,不管在web.服务端还是客户端都有广泛 ...

关于Cocos2d-x中监听物体不超越边界的解决方案

写一个监听器 touchlistener->onTouchMoved = [this](Touch* pTouch, Event*) { auto delta = pTouch->getD ...

zabbix应用---检查ssh登录ip

案例:最近新弄了一个项目,为了确保项目的安全.自己就做了一个zabbix检查ssh登录ip的东西,这里给大家分享下步骤: 自定义zabbix检查ssh登录ip的key cat /etc/zabbix ...

laydate JS日期插件

原文出处简要介绍你是时候换一款日期控件了,而layDate非常愿意和您成为工作伙伴.她致力于成为全球最用心的web日期支撑,为国内外所有从事web应用开发的同仁提供力所能及的动力.她基于原生Jav ...

LeetCode – Refresh – Compare Version Numbers

Two notes: 1. I dont know whether C++ has a good split function for STL as the JAVA. Need to figure ...

[C++空间分配]new运算符、operator new、placement new的区别于联系

先科普一下: 1. new的执行过程: (1)通过operator new申请内存 (2)使用placement new调用构造函数(内置类型忽略此步) (3)返回内存指针 2. new和malloc ...

网页尺寸规范

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xht ...

5.Python是怎么解释的？

Python是怎么解释的? Python language is an interpreted language. Python program runs directly from the sour ...

Oracle保存带&的数据

在SQL*Plus中默认的"&"表示替代变量,也就是说,只要在命令中出现该符号,SQL*Plus就会要你输入替代值.这就意味着你无法将一个含有该符号的字符串输入数据库或赋给 ...

samba 配置文件详解及权限设置

samba的启动与关闭 /etc/init.d/smb restart /etc/init.d/nmb restart [global] workgroup = MYGROUP #工作组(可自 ...

专题

随机推荐

© 2024 憋错料 | info#biecuoliao.com | 10 q. 0.016 s.