什么是sax?
SAX是一种基于事件驱动的API。
利用SAX解析XML文档牵涉到两个部分:解析器和事件处理器。
解析器负责读取XML文档,并向事件处理器发送事件,如元素开始跟元素结束事件;
而事件处理器则负责对事件作出相应,对传递的XML数据进行处理。
sax适于处理下面的问题:
- 1、对大型文件进行处理;
- 2、只需要文件的部分内容,或者只需从文件中得到特定信息;
- 3、想建立自己的对象模型的时候。
在python中使用sax方式处理xml要先引入xml.sax中的parse函数,还有xml.sax.handler中的ContentHandler。
movies.xml:需要解析的xml文件,上一篇博客中使用dom解析的一样
<collection shelf="New Arrivals"> <movie title="Enemy Behind"> <type>War, Thriller</type> <format>DVD</format> <year>2003</year> <rating>PG</rating> <stars>10</stars> <description>Talk about a US-Japan war</description> </movie> <movie title="Transformers"> <type>Anime, Science Fiction</type> <format>DVD</format> <year>1989</year> <rating>R</rating> <stars>8</stars> <description>A schientific fiction</description> </movie> <movie title="Trigun"> <type>Anime, Action</type> <format>DVD</format> <episodes>4</episodes> <rating>PG</rating> <stars>10</stars> <description>Vash the Stampede!</description> </movie> <movie title="Ishtar"> <type>Comedy</type> <format>VHS</format> <rating>PG</rating> <stars>2</stars> <description>Viewable boredom</description> </movie> </collection>
xmltest.py:解析代码如下
# -*- coding:UTF-8 -*- ‘‘‘ Created on 2015年9月10日 @author: xiaowenhui ‘‘‘ import xml.sax #第二种方法,sax解析 class MovieHandler(xml.sax.ContentHandler): #继承于xml.sax.ContentHandler类 def __init__(self): self.CurrentData = "" self.type = "" self.format = "" self.year = "" self.episodes = "" self.rating = "" self.stars = "" self.description = "" self.title = "" # 元素开始事件处理 def startElement(self, tag, attributes): self.CurrentData = tag if tag == "movie": print "*****Movie*****" self.title = attributes["title"] print "Title:", self.title # 内容事件处理 def characters(self, content): if self.CurrentData == "type": self.type = content elif self.CurrentData == "format": self.format = content elif self.CurrentData == "year": self.year = content elif self.CurrentData == "episodes": self.episodes = content elif self.CurrentData == "rating": self.rating = content elif self.CurrentData == "stars": self.stars = content elif self.CurrentData == "description": self.description = content # 元素结束事件处理 def endElement(self, tag): if self.CurrentData == "type": print "Type:", self.type elif self.CurrentData == "format": print "Format:", self.format elif self.CurrentData == "year": print "Year:", self.year elif self.CurrentData == "episodes": print "Episodes:", self.episodes elif self.CurrentData == "rating": print "Rating:", self.rating elif self.CurrentData == "stars": print "Stars:", self.stars elif self.CurrentData == "description": print "Description:", self.description # 创建一个 XMLReader parser = xml.sax.make_parser() # turn off namepsaces parser.setFeature(xml.sax.handler.feature_namespaces, 0) # 重写 ContextHandler Handler = MovieHandler() parser.setContentHandler( Handler ) parser.parse("movies.xml")
输出结果如下:
疑问:不知道为什么会多输出一个description,可能是sax解析的时候哪里写的不对,现在还没找到原因,我把
elif self.CurrentData == "description": print "Description:", self.description 改成
elif self.CurrentData == "description": print self.description后就没有输出“description”,只输出了self.description这个参数
*****Movie***** Title: Enemy Behind Type: War, Thriller Format: DVD Year: 2003 Rating: PG Stars: 10 description: Talk about a US-Japan war description: *****Movie***** Title: Transformers Type: Anime, Science Fiction Format: DVD Year: 1989 Rating: R Stars: 8 description: A schientific fiction description: *****Movie***** Title: Trigun Type: Anime, Action Format: DVD Episodes: 4 Rating: PG Stars: 10 description: Vash the Stampede! description: *****Movie***** Title: Ishtar Type: Comedy Format: VHS Rating: PG Stars: 2 description: Viewable boredom description: description:
时间: 2024-10-29 19:07:32