使用Python访问网络数据 python network-data 第五章

Lesson 5--Extracting Data from XML

In this assignment you will write a Python program somewhat similar tohttp://www.pythonlearn.com/code/geoxml.py. The program will prompt for a URL, read the XML data from that URL using urllib and then parse and extract the comment counts from the XML data, compute the sum of the numbers in the file and enter the sum,

Extracting Data from XML

In this assignment you will write a Python program somewhat similar to http://www.pythonlearn.com/code/geoxml.py. The program will prompt for a URL, read the XML data from that URL using urllib and then parse and extract the comment counts from the XML data, compute the sum of the numbers in the file.

We provide two files for this assignment. One is a sample file where we give you the sum for your testing and the other is the actual data you need to process for the assignment.

You do not need to save these files to your folder since your program will read the data directly from the URL. Note: Each student will have a distinct data url for the assignment - so only use your own data url for analysis.

Data Format and Approach

The data consists of a number of names and comment counts in XML as follows:

<comment>
  <name>Matthias</name>
  <count>97</count>
</comment>

You are to look through all the <comment> tags and find the <count> values sum the numbers. The closest sample code that shows how to parse XML is geoxml.py. But since the nesting of the elements in our data is different than the data we are parsing in that sample code you will have to make real changes to the code.

To make the code a little simpler, you can use an XPath selector string to look through the entire tree of XML for any tag named ‘count‘ with the following line of code:

counts = tree.findall(‘.//count‘)

Take a look at the Python ElementTree documentation and look for the supported XPath syntax for details. You could also work from the top of the XML down to the comments node and then loop through the child nodes of the comments node.

Sample Execution

$ python solution.py
Enter location: http://python-data.dr-chuck.net/comments_42.xml
Retrieving http://python-data.dr-chuck.net/comments_42.xml
Retrieved 4204 characters
Count: 50
Sum: 2...
import urllib
import xml.etree.ElementTree as ET

#serviceurl = ‘http://maps.googleapis.com/maps/api/geocode/xml?‘

while True:
    url = raw_input(‘Enter location: ‘)
    #url = ‘http://python-data.dr-chuck.net/comments_42.xml‘
    if len(url) < 1 : break
    print ‘Retrieving‘, url
    uh = urllib.urlopen(url)
    data = uh.read()
    print ‘Retrieved‘,len(data),‘characters‘

    tree = ET.fromstring(data)

    allNode = tree.findall(‘.//count‘)

    print ‘count: ‘, len(allNode)

    sum = 0
    for node in allNode:

        number = int(node.text)
        sum = sum + number

    print ‘sum:‘, sum

    break
    
时间: 2024-10-14 20:25:40

使用Python访问网络数据 python network-data 第五章的相关文章

使用Python访问网络数据 python network-data 第六章(2)

Welcome 吴铭英 from Using Python to Access Web Data ×Your answer is correct, score saved. Your current grade on this assignment is: 100% Calling a JSON API In this assignment you will write a Python program somewhat similar to http://www.pythonlearn.com

使用Python访问网络数据 python network-data 第六章

question: Extracting Data from JSON The program will prompt for a URL, read the JSON data from that URL using urllib and then parse and extract the comment counts from the JSON data, compute the sum of the numbers in the file. Extracting Data from JS

使用python访问网络上的数据

这两天看完了Course上面的: 使用 Python 访问网络数据 https://www.coursera.org/learn/python-network-data/ 写了一些作业,完成了一些作业.做些学习笔记以做备忘. 1.正则表达式 --- 虽然后面的课程没有怎么用到这个知识点,但是这个技能还是蛮好的. 附上课程中列出来的主要正则表达式的用法: Python Regular Expression Quick Guide ^ Matches the beginning of a line

Python核心编程(第二版) 第五章习题答案

5-1.整型.讲讲Python普通整型和长整型的区别. 答:Python 的标准整数类型是最通用的数字类型.在大多数 32 位机器上,标准整数类型的取值范围是-2**31到 2**31-1,也就是-2,147,483,648 到 2,147,483,647.如果在 64 位机器上使用 64 位编译器编译 Python,那么在这个系统上的整数将是 64 位. Python 的长整数类型能表达的数值仅仅与你的机器支持的(虚拟)内存大小有关. 5-2.操作符.(a)写一个函数,计算并返回两个数的乘积.

python处理网络数据

1.python中RE的使用 python若想使用re 需要先import re re自带的两个重要函数: 1.re.findall('',string)  可以用()将待查找的RE括起来,则只返回()中的RE 2. 2.python如何通过端口号访问网络 import socket mysock=socket.socket(socket.AF_INET,socket.SOCK_STREAM) mysock.connect(('www.',80)) mysock.send('GET http:/

NSURLSession访问网络数据

1.NSMutableURLRequest的设置 //创建NSMutableURLRequest对象 NSMutableURLRequest *request = [NSMutableURLRequest requestWithURL:url]; //设置请求类型 [request setHTTPMethod:@"POST"]; //设置超时时间 [request setTimeoutInterval:60]; //设置缓存策略 [request setCachePolicy:NSUR

关于windows service不能访问网络共享盘(NetWork Drive)的解决方案

我映射一个网络驱动器到本机的时候,发现本机的程序直接能访问读取网络驱动器,但是把本机的程序作为本机的windows服务运行的时候就不能访问了. Qt中的QDir::exist(folder)访问失败.这样导致的原因是映射网络驱动器,需要服务程序运行的权限账户和远程共享文件的机器要运行在同一个域(Domain) 下面,用PStool这个工具就可以了,它直接使用SYSTEM账户权限,来映射网络驱动器.这样,远程驱动器和本机的service就处于同一个域下了. 注意:这个是比较hack的做法,并不推荐

分享一个简单的python+mysql网络数据抓取

最近学习python网络爬虫,所以自己写了一个简单的程序练练手(呵呵..).我使用的环境是python3.6和mysql8.0,抓取目标网站为百度热点(http://top.baidu.com/).我只抓取了实时热点内容,其他栏目应该类似.代码中有两个变量SECONDS_PER_CRAWL和CRAWL_PER_UPDATE_TO_DB,前者为抓取频率,后者为抓取多少次写一次数据库,可自由设置.我抓取的数据内容是热点信息,链接,关注人数和时间.其在内存中存放的结构为dict{tuple(热点信息,

SWIFT中使用AFNetwroking访问网络数据

AFNetworking 是 iOS 一个使用很方便的第三方网络开发框架,它可以很轻松的从一个URL地址内获取JSON数据. 在使用它时我用到包管理器Cocoapods 不懂的请移步: Cocoapods安装:http://www.cnblogs.com/foxting/p/4520758.html RUBY安装:http://www.cnblogs.com/foxting/p/4520829.html 1.在终端中用CD命令定位到所建项目的根目录,我当前的项目名为Fresh 接着在终端内输入: