Python 3.4 - urllib.request 学习爬虫爬网页（一）

比如爬baidu.com, 在python 3.4 中应该这么写

<span style="font-family:Microsoft YaHei;font-size:14px;">import urllib.request

def getHtml(url):
    page = urllib.request.urlopen(url)
    html = page.read()
    return html

html = getHtml("http://baidu.com")
print (html)
</span>

错误提示1：

print "hello" SyntaxError: Missing parentheses in call to ‘print‘

print 的语法在2 跟3 中不一样

print ("hello") in python 3

print "hello" in python 2

错误提示2 ：

No module named ‘urllib2‘

python3.3里面，用urllib.request代替urllib2

参考官方文档 https://docs.python.org/3/

时间： 2025-01-01 11:20:42

Python 3.4 - urllib.request 学习爬虫爬网页（一）的相关文章

通过python的urllib.request库来爬取一只猫

我们实验的网站很简单,就是一个关于猫的图片的网站:http://placekitten.com 代码如下: import urllib.request respond = urllib.request.urlopen("http://placekitten.com.s3.amazonaws.com/homepage-samples/200/287.jpg") cat_img = respond.read() f = open('cat_200_300.jpg','wb') f.writ

2. Python标准库urllib.request模块_2(python3)

参考学习地址:http://www.iplaypython.com # coding:utf-8 # 学习1 import urllib.request # print(dir(html)) # 获取网页所在的header信息 url="http://www.iplaypython.com/" html=urllib.request.urlopen(url) # 获取网站返回的状态码 code = html.getcode() print("返回的状态码: %s"

urllib.request.Request

https://www.programcreek.com/python/example/59427/urllib.request.Request https://docs.python.org/3.5/library/urllib.request.html#urllib.request.Request 原文地址:https://www.cnblogs.com/winditsway/p/12564299.html

Python爬虫学习之爬美女图片

最近看机器学习挺火的,然后,想要借助业余时间,来学习Python,希望能为来年找一份比较好的工作. 首先,学习得要有动力,动力,从哪里来呢?肯定是从日常需求之中来.我学Python看网上介绍.能通过Python来编写爬虫,于是,我也的简单的看了一下Python的介绍,主要是Python的一些语法,还有正则表达式. 好了,学习使用Python之前,来给大家看一下我们需要进行爬去的网站: 看到这个网站,感谢美女很养眼的同时,网站的图片也不太过,就是比较性感而已.看到这个多的美女,你想不想要将这些爬取

【python学习】网络爬虫——爬百度贴吧帖子内容

很久以前,很喜欢泡贴吧,因为那里有自己牵挂的人和事一转眼过去好多年了...... 一个简单的学习爬虫的例子,爬百度贴吧帖子内容代码如下: # -*- coding:utf-8 -*- #import urllib import urllib2 import re #功能性类,过滤掉html标签的类 class Tool: #去除img标签,7位长空格 removeImg = re.compile('<img.*?>| {7}|') #删除超链接标签 remo

Python爬小草1024图片，盖达尔的诱惑（urllib.request）

项目说明: Python版本:3.7.2 模块:urllib.request,re,os,ssl 目标地址:http://小草.com/ 第二个爬虫项目,设备转移到了Mac上,Mac上的Pycharm有坑, 环境变量必须要配置好,解释器要选对,不然模块加载不出来项目实现: #!/usr/bin/env python3 # -*- coding:utf-8 -*- #__author__ = 'vic' ##导入模块 import urllib.request,re,os 小草图片下载时后有s

【爬虫】使用urllib.request去爬取小说

import urllib.request import re #1获取主页源代码 #2获取章节超链接 #3获取章节内容 #4下载小说 #驼峰命名法 #注释获取小说内容 def getNovelContent(): #获取源代码 HTTP Response对象 html = urllib.request.urlopen('http://www.quanshuwang.com/book/0/269/') html = html.read() #print(html) #设置编码 html = h

Python编写网页爬虫爬取oj上的代码信息

OJ升级,代码可能会丢失. 所以要事先备份. 一开始傻傻的复制粘贴, 后来实在不能忍, 得益于大潇的启发和聪神的原始代码, 网页爬虫走起! 已经有段时间没看Python, 这次网页爬虫的原始代码是 python2.7版本, 试了一下修改到3.0版本, 要做很多包的更替,感觉比较烦,所以索性就在这个2.7版本上完善了. 首先观赏一下原始代码,我给加了一些注释: # -*- coding: cp936 -*- import urllib2 import urllib import re import

用Python爬虫爬取广州大学教务系统的成绩（内网访问）

用Python爬虫爬取广州大学教务系统的成绩(内网访问) 在进行爬取前,首先要了解: 1.什么是CSS选择器? 每一条css样式定义由两部分组成,形式如下: [code] 选择器{样式} [/code] 在{}之前的部分就是"选择器"."选择器"指明了{}中的"样式"的作用对象,也就是"样式"作用于网页中的哪些元素.可参考:http://www.w3school.com.cn/cssref/css_selectors.asph