爬去百度热搜榜

1.打开网站http://top.baidu.com/buzz?b=1&fr=topindex

2.右键找到源代码

3.用工具爬取数据

import requests
from bs4 import BeautifulSoup
import pandas as pd
titles=[]
hots=[]
url=‘http://top.baidu.com/buzz?b=1&fr=topindex‘#百度今日热搜
headers = {‘User-Agent‘:‘Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko)Chrome/69.0.3497.100 Safari/537.36‘}#伪装爬虫
r=requests.get(url)#请求网站
r.raise_for_status()
r.encoding = r.apparent_encoding
html = r.text
table = BeautifulSoup(html,"html.parser").find("table")
soup=BeautifulSoup(html,‘lxml‘)#使用工具
for m in soup.find_all(class_="list-title"):
titles.append(m.get_text().strip())
for n in soup.find_all(class_="icon-rise"):
hots.append(n.get_text().strip())
final=[titles,hots]
print(final)
s=pd.DataFrame(final,index=["标题","搜索指数"])
print(s.T)

4.爬取的数据为

原文地址:https://www.cnblogs.com/xx1129/p/12543514.html

时间: 2024-08-30 09:19:56

爬去百度热搜榜的相关文章

爬取百度热搜榜

1.打开网站:http://top.baidu.com/buzz?b=341&c=513&fr=topbuzz_b42 2.按Ctrl+u查看网页源代码 3.招到要爬取的数据 4. import requests from bs4 import BeautifulSoup import pandas as pd url = 'http://top.baidu.com/buzz?b=341&c=513&fr=topbuzz_b341_c513' headers = {'Use

爬取百度热搜榜前十

1.导入相应的库 2.找到要爬取的网站:http://top.baidu.com/buzz?b=341&c=513&fr=topbuzz_b341_c513 3.找到爬取的内容: 4.用for循环将需要的内容添加到空列表中,在使用DataFrame打印出热搜榜前十 import requests from bs4 import BeautifulSoup import bs4 import pandas as pd url = 'http://top.baidu.com/buzz?b=34

爬取微博热搜榜

import requestsfrom bs4 import BeautifulSoupurl = 'https://s.weibo.com/top/summary?cate=realtimehot'headers = {'User-Agent':'Mozilla/5.0 (Windows NT 6.3; Win64; x64)'                  'AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.100 Safar

python网络爬虫:实现百度热搜榜数据爬取

from bs4 import BeautifulSoup from selenium import webdriver import time import xlwt #打开网页 url="http://top.baidu.com/buzz?b=1&fr=topindex" driver = webdriver.Chrome() driver.get(url) #time.sleep(5) #获取网页信息 html=driver.page_source soup=Beauti

Django学习---抽屉热搜榜分析【all】

Python实例---抽屉热搜榜前端代码分析 Python实例---抽屉后台框架分析 Python学习---抽屉框架分析[点赞功能分析] Python学习---抽屉框架分析[数据库设计分析]180313 Python学习---抽屉框架分析[ORM操作]180314 Python学习---抽屉框架分析[小评论分析]0315 Python学习---抽屉框架分析[点赞功能/文件上传分析]0317 原文地址:https://www.cnblogs.com/ftl1012/p/9495299.html

Python(16)_爬去百度图片(urlopen和urlretrieve)

import urllib.request image_url = 'http://img18.3lian.com/d/file/201709/21/f498e01633b5b704ebfe0385f52bad20.jpg' response = urllib.request.urlopen(url=image_url) # 二进制的形式保存,方法一 with open('qing.jpg','wb') as fp: fp.write(response.read()) 方法2: 直接保存 imp

Python实例---抽屉热搜榜学习版

<!DOCTYPE html> <html lang="en"> <head> <meta charset="UTF-8"> <title>欢迎来到FTL的网站</title> <style> /*------------------------------------头部信息开始------------------------*/ * { margin: 0; padding: 0

Python爬去百度音乐

编译器环境:Python3.6 代码: #!/usr/bin/env python #-*-coding=utf-8 -*- #AUTHOR:duwentao import  requests import re import json def get_sids_by_name(name):     url = 'http://music.baidu.com/search'     data = {         'key':name     }     reponse = requests.

爬取微博热搜

import requests from lxml import etree ###网址 url="https://s.weibo.com/top/summary?Refer=top_hot&topnav=1&wvr=6" ###模拟浏览器 header={'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3