python利用beautifulsoup多页面爬虫

利用了beautifulsoup进行爬虫，解析网址分页面爬虫并存入文本文档：

结果：

源码：

from bs4 import BeautifulSoup
from urllib.request import urlopen
with open("热门标题.txt","a",encoding="utf-8") as f:
    for i in range(2):
        url = "http://www.ltaaa.com/wtfy-{}".format(i)+".html"
        html = urlopen(url).read()
        soup = BeautifulSoup(html,"html.parser")
        titles = soup.select("div[class = ‘dtop‘ ] a") # CSS 选择器
        for title in titles:
             print(title.get_text(),title.get(‘href‘))# 标签体、标签属性
             f.write("标题：{}\n".format(title.get_text()))

原文地址：https://www.cnblogs.com/mm20/p/10357727.html

时间： 2024-10-10 13:11:14

python利用beautifulsoup多页面爬虫的相关文章

Python 利用 BeautifulSoup 爬取网站获取新闻流

0. 引言介绍下 Python 用 Beautiful Soup 周期性爬取 xxx 网站获取新闻流: 图 1 项目介绍 1. 开发环境 Python: 3.6.3 BeautifulSoup: 4.2.0 , 是一个可以从HTML或XML文件中提取数据的Python库* ( BeautifulSoup 的中文官方文档:https://www.crummy.com/software/BeautifulSoup/bs4/doc.zh/ ) 2. 代码介绍实现主要分为三个模块: 1. 计时

[python]利用urllib+urllib2解决爬虫分页翻页问题

最近由于公司的自动化测试工具需要将测试结果导出到excel中,奈何没有学SSH,导致无法在工具本身中添加(工具是开发做的),故转而使用python爬虫来做,开发过程中遇到了一个问题: 由于测试结果太多,需要翻页,而翻页时网址没有变化,这就导致抓取的时候没法依照网址去爬,遂去网上查找解决方法,最后找到利用urllib2提交post的方法来解决. 解决过程: 网址不变,而如果是用selenium的话,我又觉得太慢,毕竟selenium是用来做验收测试的,不是用来爬数据的.言归正传,利用urllib2

我的第一个的python抓取单页面爬虫

爬取豆瓣推荐书籍页的图书的图片保存到本地 # -*- coding UTF-8 -*- import re import requests import os def getsite(url): website=requests.get(url) url="https://book.douban.com/tag/%E7%BC%96%E7%A8%8B" website=requests.get(url) links=re.findall("(https:\/\/img3.dou

python爬虫主要就是五个模块：爬虫启动入口模块，URL管理器存放已经爬虫的URL和待爬虫URL列表，html下载器，html解析器，html输出器同时可以掌握到urllib2的使用、bs4（BeautifulSoup）页面解析器、re正则表达式、urlparse、python基础知识回顾（set集合操作）等相关内容。

本次python爬虫百步百科,里面详细分析了爬虫的步骤,对每一步代码都有详细的注释说明,可通过本案例掌握python爬虫的特点: 1.爬虫调度入口(crawler_main.py) # coding:utf-8from com.wenhy.crawler_baidu_baike import url_manager, html_downloader, html_parser, html_outputer print "爬虫百度百科调度入口" # 创建爬虫类class SpiderMai

python利用beautifulsoup多页面爬虫

python利用beautifulsoup多页面爬虫的相关文章

Python 利用 BeautifulSoup 爬取网站获取新闻流

[python]利用urllib+urllib2解决爬虫分页翻页问题

我的第一个的python抓取单页面爬虫

python 利用爬虫获取页面上下拉框里的所有国家

Python 利用爬虫爬取网页内容（div节点的疑惑）

Python结合BeautifulSoup抓取知乎数据

python初体验之小小爬虫

【Python】Python抓取分享页面的源代码示例