python爬虫（10）身边的翻译专家——获取有道翻译结果

本文目的：使用python 实现翻译效果

思路：有道翻译可以直接翻译内容，观察它的网页内容以及URL可以发现，它的基本url 和将要翻译的内容组合起来就是最后翻译的页面

比如：有道中英文翻译的网址是：http://dict.youdao.com/

将要翻译的内容是： I‘m a Chinese

点击一下翻译，然后出现的含有翻译结果页面的地址是： http://dict.youdao.com/w/eng/I‘m%20a%20chinese/#keyfrom=dict2.index

虽然这个网址后面跟了“#keyfrom=dict2.index” 但是不影响

直接访问 http://dict.youdao.com/w/eng/I‘m%20a%20chinese 也能看到翻译结果

因此总体思路如下：

1.获取将要翻译的内容

2.将翻译的内容和有道翻译网址组成新的url

3.获取这个url的页面内容

4.根据这个页面内容获取翻译结果

代码如下：

#!/usr/bin/python
#coding:utf-8  

import HTMLParser
import urllib2
import re
import sys  

reload(sys)
sys.setdefaultencoding( "utf-8" )

class BaiduFanyi:
	def __init__(self,url):
		self.url=url

	def get_html_Pages(self,url):
		try:
			headers = {‘User-Agent‘:‘Mozilla/5.0 (Windows NT 6.2; rv:16.0) Gecko/20100101 Firefox/16.0‘}
			#构建请求的request
			request=urllib2.Request(url,headers=headers)
			#利用urlopen获取页面代码
			response=urllib2.urlopen(request)
			#将页面转化为UTF-8编码格式
			html=response.read().decode(‘utf-8‘)
			html=HTMLParser.HTMLParser().unescape(html)#处理网页内容， 可以将一些html类型的符号如" 转换回双引号
			return html
		#捕捉异常，防止程序直接死掉
		except urllib2.URLError,e:
			print u"连接失败，错误原因",e.reason
			return None
		except urllib2.HTTPError,e:
			print u"连接失败，错误原因：%s " % e.code
			return None  

	def get_finally_result(self,html):
		result_pattern=re.compile(‘<div class="trans-container".*?<p>.*?<p>(.*?)</p>.*?</div>‘,re.S)
		result=re.search(result_pattern,html)

		trans_result= result.group(1)
		return trans_result

	def run(self):
		html=self.get_html_Pages(self.url)
		self.get_finally_result(html)

if __name__ == ‘__main__‘:
	author_content=‘‘‘
		*****************************************************
				welcome to spider of baidufanyi
				     modify on 2017-05-11
				        @author: Jimy_Fengqi
	         	http://blog.csdn.net/qiqiyingse?viewmode=contents
		*****************************************************
		‘‘‘
	print author_content
	keywords=raw_input(‘please input the sentence that need translate:‘)
	if not keywords:
		keywords="I‘m a Chinese"
	base_url=‘http://www.youdao.com/w/eng/%s‘ % (keywords)
	print base_url
	mybaidufanyi=BaiduFanyi(base_url)
	mybaidufanyi.run()

时间： 2024-07-30 17:20:39

python爬虫（10）身边的翻译专家——获取有道翻译结果的相关文章

【python】简单的网页内容获取 - 有道翻译英文

正则表达式与python的网页操作练习一: import urllib.request import re qname=input('input english:') qname=qname.strip() url='http://dict.youdao.com/search?le=eng&q='+qname+'&keyfrom=dict.top' html=urllib.request.urlopen(url) source=html.read().decode('UTF-8') reg

Python3网络爬虫(二)：利用urllib.urlopen向有道翻译发送数据获得翻译结果

一.urlopen的url参数 Agent url不仅可以是一个字符串,例如:http://www.baidu.com.url也可以是一个Request对象,这就需要我们先定义一个Request对象,然后将这个Request对象作为urlopen的参数使用,方法如下: # -*- coding: UTF-8 -*- from urllib import request if __name__ == "__main__": req = request.Request("http

获取有道翻译页面

#!/usr/bin/env python#-*- coding:utf-8 -*- import urllibimport urllib2 #有一颗耐不住寂寞的心,决定再试试找找其他(除了POST方式)跟有道干起来的爬取方式#功夫不负有心人 url="http://dict.youdao.com/search?q=python&keyfrom=fanyi.smartResult" queryword = raw_input("Enter word:")he

小白学 Python 爬虫（30）：代理基础

人生苦短,我用 Python 前文传送门: 小白学 Python 爬虫(1):开篇小白学 Python 爬虫(2):前置准备(一)基本类库的安装小白学 Python 爬虫(3):前置准备(二)Linux基础入门小白学 Python 爬虫(4):前置准备(三)Docker基础入门小白学 Python 爬虫(5):前置准备(四)数据库基础小白学 Python 爬虫(6):前置准备(五)爬虫框架的安装小白学 Python 爬虫(7):HTTP 基础小白学 Python 爬虫(8):网页基

小白学 Python 爬虫（32）：异步请求库 AIOHTTP 基础入门

小白学 Python 爬虫（37）：爬虫框架 Scrapy 入门基础（五） Spider Middleware

小白学 Python 爬虫（40）：爬虫框架 Scrapy 入门基础（七）对接 Selenium 实战

小白学 Python 爬虫（42）：春节去哪里玩（系列终篇）

Python爬虫学习（1）

接触python不久,也在慕课网学习了一些python相关基础,对于爬虫初步认为是依靠一系列正则获取目标内容数据于是参照着慕课网上的教学视频,完成了我的第一个python爬虫,鸡冻 >_< # !/usr/bin/env python # -*- coding: UTF-8 -*- # addUser: Gao # addTime: 2018-01-27 23:06 # description: python爬虫练习 import urllib2, re, os # 获取目标网址 Targe