python --爬虫--爬取百度翻译

import requestsimport json

class baidufanyi:    def __init__(self, trans_str):        self.lang_detect_url = ‘https://fanyi.baidu.com/langdetect‘  # 语言检测地址        self.trans_str = trans_str        self.headers=  {‘User-Agent:Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Mobile Safari/537.36‘}

    def parse_url(self,url,data):        resonpse = requests.post(url,data=data,headers=self.headers)        return  json.loads(resonpse.content.decode())   #将字符串转化为字典    def run(self):    # 1 获取语言类型        # 1.1  准备post 的url的地址  post_data        lang_detect_data = {‘query‘: self.trans_str}        # 1.2  发送post 请求 获取数据        lang = self.parse_url(self.lang_detect_url,lang_detect_data)[‘lan‘]        # 1.3  提取语言类型    # 2 准备post 数据    # 3 发送请求 , 获取响应    # 4 提取翻译结果

if __name__ == ‘__main__‘:    baidufanyi = baidufanyi()    baidufanyi.run()

原文地址：https://www.cnblogs.com/baili-luoyun/p/10341272.html

时间： 2024-10-13 15:28:18

python --爬虫--爬取百度翻译的相关文章

python爬虫—爬取百度百科数据

爬虫框架:开发平台 centos6.7 根据慕课网爬虫教程编写代码片区百度百科url,标题,内容分为4个模块:html_downloader.py 下载器 html_outputer.py 爬取数据生成html模块 html_parser 获取有用数据 url_manager url管理器 spider_main 爬虫启动代码 spider_main.py 1 #!/usr/bin/python 2 #-*- coding: utf8 -*- 3 4 import html_download

Python爬虫爬取百度贴吧的帖子

同样是参考网上教程,编写爬取贴吧帖子的内容,同时把爬取的帖子保存到本地文档: #!/usr/bin/python#_*_coding:utf-8_*_import urllibimport urllib2import reimport sys reload(sys)sys.setdefaultencoding("utf-8")#处理页面标签,去除图片.超链接.换行符等class Tool: #去除img标签,7位长空格 removeImg = re.compile('<img.*

Python爬虫爬取百度贴吧的图片

根据输入的贴吧地址,爬取想要该贴吧的图片,保存到本地文件夹,仅供参考: #!/usr/bin/python#_*_coding:utf-8_*_import urllibimport urllib2import reimport osimport sys reload(sys)sys.setdefaultencoding("utf-8")#下载图片class GetPic: #页面初始化 def __init__(self,baseUrl,seelz): #base链接地址 self.

Python爬虫-爬取百度贴吧帖子

这次主要学习了替换各种标签,规范格式的方法.依然参考博主崔庆才的博客. 1.获取url 某一帖子:https://tieba.baidu.com/p/3138733512?see_lz=1&pn=1 其中https://tieba.baidu.com/p/3138733512?为基础部分,剩余的为参数部分. http:// 代表资源传输使用http协议 tieba.baidu.com 是百度的二级域名,指向百度贴吧的服务器. /p/3138733512 是服务器某个资源,即这个帖子的地址定位符

用python爬虫爬取百度外卖店铺排名

#!/usr/bin/env python # encoding: utf-8 """ @version: ?? @author: phpergao @license: Apache Licence @file: baidu_paiming.py @time: 2016/8/1 11:10 """ import requests,re,urllib,codeop,urllib.request,nturl2path,macurl2path url

python爬取百度翻译返回：{'error': 997, 'from': 'zh', 'to': 'en', 'query 问题

解决办法: 修改url为手机版的地址:http://fanyi.baidu.com/basetrans User-Agent也用手机版的测试代码: # -*- coding: utf-8 -*- """ ------------------------------------------------- File Name: requestsGet Description : 爬取在线翻译数据s Author : 神秘藏宝室 date: 2018-04-17 --------

python爬虫爬取csdn博客专家所有博客内容

python爬虫爬取美女图片

python 爬虫爬取美女图片 #coding=utf-8 import urllib import re import os import time import threading def getHtml(url): page = urllib.urlopen(url) html = page.read() return html def getImgUrl(html,src): srcre = re.compile(src) srclist = re.findall(srcre,html)

Python爬虫爬取博客园并保存

Python爬虫爬取博客园并保存爬取博客园指定用户的文章修饰后全部保存到本地首先定义爬取的模块文件: crawlers_main.py 执行入口 url_manager.py url管理器 download_manager.py 下载模块 parser_manager.py html解析器(解析html需要利用的内容) output_manager.py 输出html网页全部内容文件(包括css,png,js等) crawlers_main.py 执行入口 1 # coding