爬取汽车之家新闻图片的python爬虫代码

import requestsfrom bs4 import BeautifulSouprespone=requests.get(‘https://www.autohome.com.cn/news/‘)respone.encoding=‘gbk‘# print(respone.text)

soup=BeautifulSoup(respone.text,‘html.parser‘)div=soup.find(name=‘div‘,attrs={‘id‘:‘auto-channel-lazyload-article‘})li_list=div.find_all(name=‘li‘)

i=1for li in li_list:    print(‘pro:‘,i)    title=li.find(name=‘h3‘)    if not title:        continue    p=li.find(name=‘p‘)    a=li.find(name=‘a‘)    img=li.find(name=‘img‘)

    print(title.text)    print(p.text)    print(‘https:‘+a.attrs.get(‘href‘))    print(‘https:‘+img.get(‘src‘))  #img.get==img.attrs.get

    #请求下载图片    src=‘https:‘+img.get(‘src‘)    file_name=src.rsplit(‘/‘,maxsplit=1)[1]    with open(file_name,‘wb‘) as f:        ret=requests.get(src)        f.write(ret.content)

原文地址：https://www.cnblogs.com/xpptt/p/11772628.html

时间： 2024-10-02 15:26:24

爬取汽车之家新闻图片的python爬虫代码的相关文章

python3 爬取汽车之家所有车型操作步骤

题记: 互联网上关于使用python3去爬取汽车之家的汽车数据(主要是汽车基本参数,配置参数,颜色参数,内饰参数)的教程已经非常多了,但大体的方案分两种: 1.解析出汽车之家某个车型的网页,然后正则表达式匹配出混淆后的数据对象与混淆后的js,并对混淆后的js使用pyv8进行解析返回正常字符,然后通过字符与数据对象进行匹配,具体方法见这位园友,传送门:https://www.cnblogs.com/my8100/p/js_qichezhijia.html (感谢这位大神前半部分的思路) 2.解析出

爬取汽车之家新闻

a.首先伪造浏览器向某个地址发送HTTP请求,获取返回的字符串 import requestsresponse=requests.get(url='地址')#get请求 response.content #内容 response.encoding=apparent_encoding #检测编码形式,并设置编码 response.text #自动转码 b.通过Beautifulsoup4解析HTML格式字符串 from bs4 import BeautifulSoup soup = Beautif

爬取汽车之家

import requests from bs4 import BeautifulSoup response = requests.get('https://www.autohome.com.cn/news/') response.encoding = 'gbk' soup = BeautifulSoup(response.text,"html.parser") div =soup.find(name='div',id='auto-channel-lazyload-article')

py 爬取汽车之家新闻案例

``` import requests from bs4 import BeautifulSoup response = requests.get("https://www.autohome.com.cn/news/") # 1. content /text 的区别 # print(response.content) # content 拿到的字节 response.encoding = 'gbk' # print(response.text) # text 拿到的文本信息 soup

python爬虫代码

原创python爬虫代码主要用到urllib2.BeautifulSoup模块 #encoding=utf-8 import re import requests import urllib2 import datetime import MySQLdb from bs4 import BeautifulSoup import sys reload(sys) sys.setdefaultencoding("utf-8") class Splider(object): def __in

scrapy爬虫之爬取汽车信息

scrapy爬虫还是很简单的,主要是三部分:spider,item,pipeline 其中后面两个也是通用套路,需要详细解析的也就是spider. 具体如下: 在网上找了几个汽车网站,后来敲定,以易车网作为爬取站点原因在于,其数据源实在是太方便了. 看这个页面,左边按照品牌排序,搜索子品牌,再挨个查看信息即可按照通常的思路,是需要手动解析左边这列表找出每个品牌的链接页面结果分析源码发现,网站直接通过js生成的导航栏,直接通过这个链接生成的json即可获得所有的信息 http://api.

爬取IT之家业界新闻

爬取站点 https://it.ithome.com/ityejie/ ,进入详情页提取内容. 1 import requests 2 import json 3 from lxml import etree 4 from pymongo import MongoClient 5 6 url = 'https://it.ithome.com/ithome/getajaxdata.aspx' 7 headers = { 8 'authority': 'it.ithome.com', 9 'met

什么是网络爬虫？有什么用？怎么爬？一篇文章带你领略python爬虫的魅力

网络爬虫也叫做网络机器人,可以代替人们自动地在互联网中进行数据信息的采集与整理.在大数据时代,信息的采集是一项重要的工作,如果单纯靠人力进行信息采集,不仅低效繁琐,搜集的成本也会提高. 此时,我们可以使用网络爬虫对数据信息进行自动采集,比如应用于搜索引擎中对站点进行爬取收录,应用于数据分析与挖掘中对数据进行采集,应用于金融分析中对金融数据进行采集,除此之外,还可以将网络爬虫应用于舆情监测与分析.目标客户数据的收集等各个领域. 当然,要学习网络爬虫开发,首先需要认识网络爬虫,本文将带领大家一起认识

【Python3爬虫】爬取美女图新姿势--Redis分布式爬虫初体验

一.写在前面之前写的爬虫都是单机爬虫,还没有尝试过分布式爬虫,这次就是一个分布式爬虫的初体验.所谓分布式爬虫,就是要用多台电脑同时爬取数据,相比于单机爬虫,分布式爬虫的爬取速度更快,也能更好地应对IP的检测.本文介绍的是利用Redis数据库实现的分布式爬虫,Redis是一种常用的菲关系型数据库,常用数据类型包括String.Hash.Set.List和Sorted Set,重要的是Redis支持主从复制,主机能将数据同步到从机,也就能够实现读写分离.因此我们可以利用Redis的特性,借助req