XML文件解析之SAX解析

使用DOM解析的时候是需要把文档的所有内容读入内存然后建立一个DOM树结构,然后通过DOM提供的接口来实现XML文件的解析,如果文件比较小的时候肯定是很方便的。但是如果是XML文件很大的话,那么这种方式的解析效率肯定会大打折扣的,所以SAX解析就很有必要的了。SAX采用基于事件驱动的处理方式,它将XML文档转换成一系列的事件,由单独的事件处理器来决定如何处理。在读入文档的过程中便实现了解析过程,现在就简单介绍下SAX解析的具体实现过程。

1.主要对象

SAXParserFactory:解析工厂

SAXParser:解析器,通过解析工厂获取

ContentHander、DTDHander、ErrorHandler,EntityResolver:事件处理器接口

DefaultHandler:继承了上面的四个事件接口,在实际开发中直接从DefaultHandler继承并实现相关函数就可以了

2.XML文档

和上次DOM解析的XML文件是一样的

<?xml version="1.0" encoding="UTF-8"?>
<world>
    <comuntry id="1">
        <name>China</name>
        <capital>Beijing</capital>
        <population>1234</population>
        <area>960</area>
    </comuntry>
    <comuntry id="2">
        <name id="">America</name>
        <capital>Washington</capital>
        <population>234</population>
        <area>900</area>
    </comuntry>
    <comuntry id="3">
        <name >Japan</name>
        <capital>Tokyo</capital>
        <population>234</population>
        <area>60</area>
    </comuntry>
    <comuntry id="4">
        <name >Russia</name>
        <capital>Moscow</capital>
        <population>34</population>
        <area>1960</area>
    </comuntry>
</world>

3.主要接口分析

EntityResolver :

package org.xml.sax;

import java.io.IOException;

public interface EntityResolver {

    /**
     * Allow the application to resolve external entities.
     *
     * <p>The parser will call this method before opening any external
     * entity except the top-level document entity.  Such entities include
     * the external DTD subset and external parameter entities referenced
     * within the DTD (in either case, only if the parser reads external
     * parameter entities), and external general entities referenced
     * within the document element (if the parser reads external general
     * entities).  The application may request that the parser locate
     * the entity itself, that it use an alternative URI, or that it
     * use data provided by the application (as a character or byte
     * input stream).</p>
     *
     * <p>Application writers can use this method to redirect external
     * system identifiers to secure and/or local URIs, to look up
     * public identifiers in a catalogue, or to read an entity from a
     * database or other input source (including, for example, a dialog
     * box).  Neither XML nor SAX specifies a preferred policy for using
     * public or system IDs to resolve resources.  However, SAX specifies
     * how to interpret any InputSource returned by this method, and that
     * if none is returned, then the system ID will be dereferenced as
     * a URL.  </p>
     *
     * <p>If the system identifier is a URL, the SAX parser must
     * resolve it fully before reporting it to the application.</p>
     *
     * @param publicId The public identifier of the external entity
     *        being referenced, or null if none was supplied.
     * @param systemId The system identifier of the external entity
     *        being referenced.
     * @return An InputSource object describing the new input source,
     *         or null to request that the parser open a regular
     *         URI connection to the system identifier.
     * @exception org.xml.sax.SAXException Any SAX exception, possibly
     *            wrapping another exception.
     * @exception java.io.IOException A Java-specific IO exception,
     *            possibly the result of creating a new InputStream
     *            or Reader for the InputSource.
     * @see org.xml.sax.InputSource
     */
    public abstract InputSource resolveEntity (String publicId,
                                               String systemId)
        throws SAXException, IOException;

}

DTDHandler :

package org.xml.sax;

/**
 * Receive notification of basic DTD-related events.
 *
 * <blockquote>
 * <em>This module, both source code and documentation, is in the
 * Public Domain, and comes with <strong>NO WARRANTY</strong>.</em>
 * See <a href=‘http://www.saxproject.org‘>http://www.saxproject.org</a>
 * for further information.
 * </blockquote>
 *
 * <p>If a SAX application needs information about notations and
 * unparsed entities, then the application implements this
 * interface and registers an instance with the SAX parser using
 * the parser‘s setDTDHandler method.  The parser uses the
 * instance to report notation and unparsed entity declarations to
 * the application.</p>
 *
 * <p>Note that this interface includes only those DTD events that
 * the XML recommendation <em>requires</em> processors to report:
 * notation and unparsed entity declarations.</p>
 *
 * <p>The SAX parser may report these events in any order, regardless
 * of the order in which the notations and unparsed entities were
 * declared; however, all DTD events must be reported after the
 * document handler‘s startDocument event, and before the first
 * startElement event.
 * (If the {@link org.xml.sax.ext.LexicalHandler LexicalHandler} is
 * used, these events must also be reported before the endDTD event.)
 * </p>
 *
 * <p>It is up to the application to store the information for
 * future use (perhaps in a hash table or object tree).
 * If the application encounters attributes of type "NOTATION",
 * "ENTITY", or "ENTITIES", it can use the information that it
 * obtained through this interface to find the entity and/or
 * notation corresponding with the attribute value.</p>
 *
 * @since SAX 1.0
 * @author David Megginson
 * @see org.xml.sax.XMLReader#setDTDHandler
 */
public interface DTDHandler {

    /**
     * Receive notification of a notation declaration event.
     *
     * <p>It is up to the application to record the notation for later
     * reference, if necessary;
     * notations may appear as attribute values and in unparsed entity
     * declarations, and are sometime used with processing instruction
     * target names.</p>
     *
     * <p>At least one of publicId and systemId must be non-null.
     * If a system identifier is present, and it is a URL, the SAX
     * parser must resolve it fully before passing it to the
     * application through this event.</p>
     *
     * <p>There is no guarantee that the notation declaration will be
     * reported before any unparsed entities that use it.</p>
     *
     * @param name The notation name.
     * @param publicId The notation‘s public identifier, or null if
     *        none was given.
     * @param systemId The notation‘s system identifier, or null if
     *        none was given.
     * @exception org.xml.sax.SAXException Any SAX exception, possibly
     *            wrapping another exception.
     * @see #unparsedEntityDecl
     * @see org.xml.sax.Attributes
     */
    public abstract void notationDecl (String name,
                                       String publicId,
                                       String systemId)
        throws SAXException;

    /**
     * Receive notification of an unparsed entity declaration event.
     *
     * <p>Note that the notation name corresponds to a notation
     * reported by the {@link #notationDecl notationDecl} event.
     * It is up to the application to record the entity for later
     * reference, if necessary;
     * unparsed entities may appear as attribute values.
     * </p>
     *
     * <p>If the system identifier is a URL, the parser must resolve it
     * fully before passing it to the application.</p>
     *
     * @exception org.xml.sax.SAXException Any SAX exception, possibly
     *            wrapping another exception.
     * @param name The unparsed entity‘s name.
     * @param publicId The entity‘s public identifier, or null if none
     *        was given.
     * @param systemId The entity‘s system identifier.
     * @param notationName The name of the associated notation.
     * @see #notationDecl
     * @see org.xml.sax.Attributes
     */
    public abstract void unparsedEntityDecl (String name,
                                             String publicId,
                                             String systemId,
                                             String notationName)
        throws SAXException;

}

ContentHandler:

package org.xml.sax;

/**
 * Receive notification of the logical content of a document.
 *
 * <blockquote>
 * <em>This module, both source code and documentation, is in the
 * Public Domain, and comes with <strong>NO WARRANTY</strong>.</em>
 * See <a href=‘http://www.saxproject.org‘>http://www.saxproject.org</a>
 * for further information.
 * </blockquote>
 *
 * <p>This is the main interface that most SAX applications
 * implement: if the application needs to be informed of basic parsing
 * events, it implements this interface and registers an instance with
 * the SAX parser using the {@link org.xml.sax.XMLReader#setContentHandler
 * setContentHandler} method.  The parser uses the instance to report
 * basic document-related events like the start and end of elements
 * and character data.</p>
 *
 * <p>The order of events in this interface is very important, and
 * mirrors the order of information in the document itself.  For
 * example, all of an element‘s content (character data, processing
 * instructions, and/or subelements) will appear, in order, between
 * the startElement event and the corresponding endElement event.</p>
 *
 * <p>This interface is similar to the now-deprecated SAX 1.0
 * DocumentHandler interface, but it adds support for Namespaces
 * and for reporting skipped entities (in non-validating XML
 * processors).</p>
 *
 * <p>Implementors should note that there is also a
 * <code>ContentHandler</code> class in the <code>java.net</code>
 * package; that means that it‘s probably a bad idea to do</p>
 *
 * <pre>import java.net.*;
 * import org.xml.sax.*;
 * </pre>
 *
 * <p>In fact, "import ...*" is usually a sign of sloppy programming
 * anyway, so the user should consider this a feature rather than a
 * bug.</p>
 *
 * @since SAX 2.0
 * @author David Megginson
 * @see org.xml.sax.XMLReader
 * @see org.xml.sax.DTDHandler
 * @see org.xml.sax.ErrorHandler
 */
public interface ContentHandler
{

    /**
     * Receive an object for locating the origin of SAX document events.
     *
     * <p>SAX parsers are strongly encouraged (though not absolutely
     * required) to supply a locator: if it does so, it must supply
     * the locator to the application by invoking this method before
     * invoking any of the other methods in the ContentHandler
     * interface.</p>
     *
     * <p>The locator allows the application to determine the end
     * position of any document-related event, even if the parser is
     * not reporting an error.  Typically, the application will
     * use this information for reporting its own errors (such as
     * character content that does not match an application‘s
     * business rules).  The information returned by the locator
     * is probably not sufficient for use with a search engine.</p>
     *
     * <p>Note that the locator will return correct information only
     * during the invocation SAX event callbacks after
     * {@link #startDocument startDocument} returns and before
     * {@link #endDocument endDocument} is called.  The
     * application should not attempt to use it at any other time.</p>
     *
     * @param locator an object that can return the location of
     *                any SAX document event
     * @see org.xml.sax.Locator
     */
    public void setDocumentLocator (Locator locator);

    /**
     * Receive notification of the beginning of a document.
     *
     * <p>The SAX parser will invoke this method only once, before any
     * other event callbacks (except for {@link #setDocumentLocator
     * setDocumentLocator}).</p>
     *
     * @throws org.xml.sax.SAXException any SAX exception, possibly
     *            wrapping another exception
     * @see #endDocument
     */
    public void startDocument ()
        throws SAXException;

    /**
     * Receive notification of the end of a document.
     *
     * <p><strong>There is an apparent contradiction between the
     * documentation for this method and the documentation for {@link
     * org.xml.sax.ErrorHandler#fatalError}.  Until this ambiguity is
     * resolved in a future major release, clients should make no
     * assumptions about whether endDocument() will or will not be
     * invoked when the parser has reported a fatalError() or thrown
     * an exception.</strong></p>
     *
     * <p>The SAX parser will invoke this method only once, and it will
     * be the last method invoked during the parse.  The parser shall
     * not invoke this method until it has either abandoned parsing
     * (because of an unrecoverable error) or reached the end of
     * input.</p>
     *
     * @throws org.xml.sax.SAXException any SAX exception, possibly
     *            wrapping another exception
     * @see #startDocument
     */
    public void endDocument()
        throws SAXException;

    /**
     * Begin the scope of a prefix-URI Namespace mapping.
     *
     * <p>The information from this event is not necessary for
     * normal Namespace processing: the SAX XML reader will
     * automatically replace prefixes for element and attribute
     * names when the <code>http://xml.org/sax/features/namespaces</code>
     * feature is <var>true</var> (the default).</p>
     *
     * <p>There are cases, however, when applications need to
     * use prefixes in character data or in attribute values,
     * where they cannot safely be expanded automatically; the
     * start/endPrefixMapping event supplies the information
     * to the application to expand prefixes in those contexts
     * itself, if necessary.</p>
     *
     * <p>Note that start/endPrefixMapping events are not
     * guaranteed to be properly nested relative to each other:
     * all startPrefixMapping events will occur immediately before the
     * corresponding {@link #startElement startElement} event,
     * and all {@link #endPrefixMapping endPrefixMapping}
     * events will occur immediately after the corresponding
     * {@link #endElement endElement} event,
     * but their order is not otherwise
     * guaranteed.</p>
     *
     * <p>There should never be start/endPrefixMapping events for the
     * "xml" prefix, since it is predeclared and immutable.</p>
     *
     * @param prefix the Namespace prefix being declared.
     *  An empty string is used for the default element namespace,
     *  which has no prefix.
     * @param uri the Namespace URI the prefix is mapped to
     * @throws org.xml.sax.SAXException the client may throw
     *            an exception during processing
     * @see #endPrefixMapping
     * @see #startElement
     */
    public void startPrefixMapping (String prefix, String uri)
        throws SAXException;

    /**
     * End the scope of a prefix-URI mapping.
     *
     * <p>See {@link #startPrefixMapping startPrefixMapping} for
     * details.  These events will always occur immediately after the
     * corresponding {@link #endElement endElement} event, but the order of
     * {@link #endPrefixMapping endPrefixMapping} events is not otherwise
     * guaranteed.</p>
     *
     * @param prefix the prefix that was being mapped.
     *  This is the empty string when a default mapping scope ends.
     * @throws org.xml.sax.SAXException the client may throw
     *            an exception during processing
     * @see #startPrefixMapping
     * @see #endElement
     */
    public void endPrefixMapping (String prefix)
        throws SAXException;

    /**
     * Receive notification of the beginning of an element.
     *
     * <p>The Parser will invoke this method at the beginning of every
     * element in the XML document; there will be a corresponding
     * {@link #endElement endElement} event for every startElement event
     * (even when the element is empty). All of the element‘s content will be
     * reported, in order, before the corresponding endElement
     * event.</p>
     *
     * <p>This event allows up to three name components for each
     * element:</p>
     *
     * <ol>
     * <li>the Namespace URI;</li>
     * <li>the local name; and</li>
     * <li>the qualified (prefixed) name.</li>
     * </ol>
     *
     * <p>Any or all of these may be provided, depending on the
     * values of the <var>http://xml.org/sax/features/namespaces</var>
     * and the <var>http://xml.org/sax/features/namespace-prefixes</var>
     * properties:</p>
     *
     * <ul>
     * <li>the Namespace URI and local name are required when
     * the namespaces property is <var>true</var> (the default), and are
     * optional when the namespaces property is <var>false</var> (if one is
     * specified, both must be);</li>
     * <li>the qualified name is required when the namespace-prefixes property
     * is <var>true</var>, and is optional when the namespace-prefixes property
     * is <var>false</var> (the default).</li>
     * </ul>
     *
     * <p>Note that the attribute list provided will contain only
     * attributes with explicit values (specified or defaulted):
     * #IMPLIED attributes will be omitted.  The attribute list
     * will contain attributes used for Namespace declarations
     * (xmlns* attributes) only if the
     * <code>http://xml.org/sax/features/namespace-prefixes</code>
     * property is true (it is false by default, and support for a
     * true value is optional).</p>
     *
     * <p>Like {@link #characters characters()}, attribute values may have
     * characters that need more than one <code>char</code> value.  </p>
     *
     * @param uri the Namespace URI, or the empty string if the
     *        element has no Namespace URI or if Namespace
     *        processing is not being performed
     * @param localName the local name (without prefix), or the
     *        empty string if Namespace processing is not being
     *        performed
     * @param qName the qualified name (with prefix), or the
     *        empty string if qualified names are not available
     * @param atts the attributes attached to the element.  If
     *        there are no attributes, it shall be an empty
     *        Attributes object.  The value of this object after
     *        startElement returns is undefined
     * @throws org.xml.sax.SAXException any SAX exception, possibly
     *            wrapping another exception
     * @see #endElement
     * @see org.xml.sax.Attributes
     * @see org.xml.sax.helpers.AttributesImpl
     */
    public void startElement (String uri, String localName,
                              String qName, Attributes atts)
        throws SAXException;

    /**
     * Receive notification of the end of an element.
     *
     * <p>The SAX parser will invoke this method at the end of every
     * element in the XML document; there will be a corresponding
     * {@link #startElement startElement} event for every endElement
     * event (even when the element is empty).</p>
     *
     * <p>For information on the names, see startElement.</p>
     *
     * @param uri the Namespace URI, or the empty string if the
     *        element has no Namespace URI or if Namespace
     *        processing is not being performed
     * @param localName the local name (without prefix), or the
     *        empty string if Namespace processing is not being
     *        performed
     * @param qName the qualified XML name (with prefix), or the
     *        empty string if qualified names are not available
     * @throws org.xml.sax.SAXException any SAX exception, possibly
     *            wrapping another exception
     */
    public void endElement (String uri, String localName,
                            String qName)
        throws SAXException;

    /**
     * Receive notification of character data.
     *
     * <p>The Parser will call this method to report each chunk of
     * character data.  SAX parsers may return all contiguous character
     * data in a single chunk, or they may split it into several
     * chunks; however, all of the characters in any single event
     * must come from the same external entity so that the Locator
     * provides useful information.</p>
     *
     * <p>The application must not attempt to read from the array
     * outside of the specified range.</p>
     *
     * <p>Individual characters may consist of more than one Java
     * <code>char</code> value.  There are two important cases where this
     * happens, because characters can‘t be represented in just sixteen bits.
     * In one case, characters are represented in a <em>Surrogate Pair</em>,
     * using two special Unicode values. Such characters are in the so-called
     * "Astral Planes", with a code point above U+FFFF.  A second case involves
     * composite characters, such as a base character combining with one or
     * more accent characters. </p>
     *
     * <p> Your code should not assume that algorithms using
     * <code>char</code>-at-a-time idioms will be working in character
     * units; in some cases they will split characters.  This is relevant
     * wherever XML permits arbitrary characters, such as attribute values,
     * processing instruction data, and comments as well as in data reported
     * from this method.  It‘s also generally relevant whenever Java code
     * manipulates internationalized text; the issue isn‘t unique to XML.</p>
     *
     * <p>Note that some parsers will report whitespace in element
     * content using the {@link #ignorableWhitespace ignorableWhitespace}
     * method rather than this one (validating parsers <em>must</em>
     * do so).</p>
     *
     * @param ch the characters from the XML document
     * @param start the start position in the array
     * @param length the number of characters to read from the array
     * @throws org.xml.sax.SAXException any SAX exception, possibly
     *            wrapping another exception
     * @see #ignorableWhitespace
     * @see org.xml.sax.Locator
     */
    public void characters (char ch[], int start, int length)
        throws SAXException;

    /**
     * Receive notification of ignorable whitespace in element content.
     *
     * <p>Validating Parsers must use this method to report each chunk
     * of whitespace in element content (see the W3C XML 1.0
     * recommendation, section 2.10): non-validating parsers may also
     * use this method if they are capable of parsing and using
     * content models.</p>
     *
     * <p>SAX parsers may return all contiguous whitespace in a single
     * chunk, or they may split it into several chunks; however, all of
     * the characters in any single event must come from the same
     * external entity, so that the Locator provides useful
     * information.</p>
     *
     * <p>The application must not attempt to read from the array
     * outside of the specified range.</p>
     *
     * @param ch the characters from the XML document
     * @param start the start position in the array
     * @param length the number of characters to read from the array
     * @throws org.xml.sax.SAXException any SAX exception, possibly
     *            wrapping another exception
     * @see #characters
     */
    public void ignorableWhitespace (char ch[], int start, int length)
        throws SAXException;

    /**
     * Receive notification of a processing instruction.
     *
     * <p>The Parser will invoke this method once for each processing
     * instruction found: note that processing instructions may occur
     * before or after the main document element.</p>
     *
     * <p>A SAX parser must never report an XML declaration (XML 1.0,
     * section 2.8) or a text declaration (XML 1.0, section 4.3.1)
     * using this method.</p>
     *
     * <p>Like {@link #characters characters()}, processing instruction
     * data may have characters that need more than one <code>char</code>
     * value. </p>
     *
     * @param target the processing instruction target
     * @param data the processing instruction data, or null if
     *        none was supplied.  The data does not include any
     *        whitespace separating it from the target
     * @throws org.xml.sax.SAXException any SAX exception, possibly
     *            wrapping another exception
     */
    public void processingInstruction (String target, String data)
        throws SAXException;

    /**
     * Receive notification of a skipped entity.
     * This is not called for entity references within markup constructs
     * such as element start tags or markup declarations.  (The XML
     * recommendation requires reporting skipped external entities.
     * SAX also reports internal entity expansion/non-expansion, except
     * within markup constructs.)
     *
     * <p>The Parser will invoke this method each time the entity is
     * skipped.  Non-validating processors may skip entities if they
     * have not seen the declarations (because, for example, the
     * entity was declared in an external DTD subset).  All processors
     * may skip external entities, depending on the values of the
     * <code>http://xml.org/sax/features/external-general-entities</code>
     * and the
     * <code>http://xml.org/sax/features/external-parameter-entities</code>
     * properties.</p>
     *
     * @param name the name of the skipped entity.  If it is a
     *        parameter entity, the name will begin with ‘%‘, and if
     *        it is the external DTD subset, it will be the string
     *        "[dtd]"
     * @throws org.xml.sax.SAXException any SAX exception, possibly
     *            wrapping another exception
     */
    public void skippedEntity (String name)
        throws SAXException;
}

ErrorHandler:

package org.xml.sax;

/**
 * Basic interface for SAX error handlers.
 *
 * <blockquote>
 * <em>This module, both source code and documentation, is in the
 * Public Domain, and comes with <strong>NO WARRANTY</strong>.</em>
 * See <a href=‘http://www.saxproject.org‘>http://www.saxproject.org</a>
 * for further information.
 * </blockquote>
 *
 * <p>If a SAX application needs to implement customized error
 * handling, it must implement this interface and then register an
 * instance with the XML reader using the
 * {@link org.xml.sax.XMLReader#setErrorHandler setErrorHandler}
 * method.  The parser will then report all errors and warnings
 * through this interface.</p>
 *
 * <p><strong>WARNING:</strong> If an application does <em>not</em>
 * register an ErrorHandler, XML parsing errors will go unreported,
 * except that <em>SAXParseException</em>s will be thrown for fatal errors.
 * In order to detect validity errors, an ErrorHandler that does something
 * with {@link #error error()} calls must be registered.</p>
 *
 * <p>For XML processing errors, a SAX driver must use this interface
 * in preference to throwing an exception: it is up to the application
 * to decide whether to throw an exception for different types of
 * errors and warnings.  Note, however, that there is no requirement that
 * the parser continue to report additional errors after a call to
 * {@link #fatalError fatalError}.  In other words, a SAX driver class
 * may throw an exception after reporting any fatalError.
 * Also parsers may throw appropriate exceptions for non-XML errors.
 * For example, {@link XMLReader#parse XMLReader.parse()} would throw
 * an IOException for errors accessing entities or the document.</p>
 *
 * @since SAX 1.0
 * @author David Megginson
 * @see org.xml.sax.XMLReader#setErrorHandler
 * @see org.xml.sax.SAXParseException
 */
public interface ErrorHandler {

    /**
     * Receive notification of a warning.
     *
     * <p>SAX parsers will use this method to report conditions that
     * are not errors or fatal errors as defined by the XML
     * recommendation.  The default behaviour is to take no
     * action.</p>
     *
     * <p>The SAX parser must continue to provide normal parsing events
     * after invoking this method: it should still be possible for the
     * application to process the document through to the end.</p>
     *
     * <p>Filters may use this method to report other, non-XML warnings
     * as well.</p>
     *
     * @param exception The warning information encapsulated in a
     *                  SAX parse exception.
     * @exception org.xml.sax.SAXException Any SAX exception, possibly
     *            wrapping another exception.
     * @see org.xml.sax.SAXParseException
     */
    public abstract void warning (SAXParseException exception)
        throws SAXException;

    /**
     * Receive notification of a recoverable error.
     *
     * <p>This corresponds to the definition of "error" in section 1.2
     * of the W3C XML 1.0 Recommendation.  For example, a validating
     * parser would use this callback to report the violation of a
     * validity constraint.  The default behaviour is to take no
     * action.</p>
     *
     * <p>The SAX parser must continue to provide normal parsing
     * events after invoking this method: it should still be possible
     * for the application to process the document through to the end.
     * If the application cannot do so, then the parser should report
     * a fatal error even if the XML recommendation does not require
     * it to do so.</p>
     *
     * <p>Filters may use this method to report other, non-XML errors
     * as well.</p>
     *
     * @param exception The error information encapsulated in a
     *                  SAX parse exception.
     * @exception org.xml.sax.SAXException Any SAX exception, possibly
     *            wrapping another exception.
     * @see org.xml.sax.SAXParseException
     */
    public abstract void error (SAXParseException exception)
        throws SAXException;

    /**
     * Receive notification of a non-recoverable error.
     *
     * <p><strong>There is an apparent contradiction between the
     * documentation for this method and the documentation for {@link
     * org.xml.sax.ContentHandler#endDocument}.  Until this ambiguity
     * is resolved in a future major release, clients should make no
     * assumptions about whether endDocument() will or will not be
     * invoked when the parser has reported a fatalError() or thrown
     * an exception.</strong></p>
     *
     * <p>This corresponds to the definition of "fatal error" in
     * section 1.2 of the W3C XML 1.0 Recommendation.  For example, a
     * parser would use this callback to report the violation of a
     * well-formedness constraint.</p>
     *
     * <p>The application must assume that the document is unusable
     * after the parser has invoked this method, and should continue
     * (if at all) only for the sake of collecting additional error
     * messages: in fact, SAX parsers are free to stop reporting any
     * other events once this method has been invoked.</p>
     *
     * @param exception The error information encapsulated in a
     *                  SAX parse exception.
     * @exception org.xml.sax.SAXException Any SAX exception, possibly
     *            wrapping another exception.
     * @see org.xml.sax.SAXParseException
     */
    public abstract void fatalError (SAXParseException exception)
        throws SAXException;

}

上面是四个基本处理事件的接口源码,通过阅读代码就可以知道每个事件需要完成的事情。

4.SAX解析具体实现过程,主要包括两个过程一个是解析规则的定义还有就是文件的读取

 

事件处理MyHandler.java

import java.io.IOException;

import org.xml.sax.Attributes;
import org.xml.sax.InputSource;
import org.xml.sax.Locator;
import org.xml.sax.SAXException;
import org.xml.sax.SAXParseException;
import org.xml.sax.helpers.DefaultHandler;

public class MyHandler extends DefaultHandler {

    /**
     * 开始前缀 URI 名称空间范围映射。
     * 此事件的信息对于常规的命名空间处理并非必需:
     * 当 http://xml.org/sax/features/namespaces 功能为 true(默认)时,
     * SAX XML 读取器将自动替换元素和属性名称的前缀。
     * 参数意义如下:
     *    prefix :前缀
     *    uri :命名空间
     */
    @Override
    public void startPrefixMapping(String prefix, String uri)
            throws SAXException {
        // TODO Auto-generated method stub
         System.out.println("(startPrefixMapping)start prefix_mapping : xmlns:"+prefix+" = "
                    +"\""+uri+"\"");
    }

    /**
     * 结束前缀 URI 范围的映射。
     * @param prefix  前缀
     */
    @Override
    public void endPrefixMapping(String prefix) throws SAXException {
        // TODO Auto-generated method stub
        System.out.println("(endPrefixMapping)end prefix_mapping : "+prefix);
    }

    /**
     * 文档结束
     */
    @Override
    public void endDocument() throws SAXException {
        // TODO Auto-generated method stub
        System.out.println("(endDocument)doument is ended");
    }

    /**
     * 接收文档的结尾的通知。
     * 参数意义如下:
     *    uri :元素的命名空间
     *    localName :元素的本地名称(不带前缀)
     *    qName :元素的限定名(带前缀)
     */
    @Override
    public void endElement(String uri, String localName, String qName)
            throws SAXException {
        // TODO Auto-generated method stub
        System.out.println("(endElement)end element : "+qName+"("+uri+")");
    }

    /**
     * 接收元素内容中可忽略的空白的通知。
     * 参数意义如下:
     *     ch : 来自 XML 文档的字符
     *     start : 数组中的开始位置
     *     length : 从数组中读取的字符的个数
     */
    @Override
    public void ignorableWhitespace(char[] ch, int start, int length)
            throws SAXException {
        // TODO Auto-generated method stub
        StringBuffer buffer = new StringBuffer();
        for(int i = start ; i < start+length ; i++){
            switch(ch[i]){
                case ‘\\‘:buffer.append("\\\\");break;
                case ‘\r‘:buffer.append("\\r");break;
                case ‘\n‘:buffer.append("\\n");break;
                case ‘\t‘:buffer.append("\\t");break;
                case ‘\"‘:buffer.append("\\\"");break;
                default : buffer.append(ch[i]);
            }
        }
        System.out.println("(ignorableWhitespace)ignorable whitespace("+length+"): "+buffer.toString());
    }

    /**
     * 接收用来查找 SAX 文档事件起源的对象。
     * 参数意义如下:
     *     locator : 可以返回任何 SAX 文档事件位置的对象
     */
    @Override
    public void setDocumentLocator(Locator locator) {
        // TODO Auto-generated method stub
        System.out.println("(setDocumentLocator)set document_locator : (lineNumber = "+locator.getLineNumber()
                +",columnNumber = "+locator.getColumnNumber()
                +",systemId = "+locator.getSystemId()
                +",publicId = "+locator.getPublicId()+")");
    }

    /**
     * 接收文档的开始的通知。
     */
    @Override
    public void startDocument() throws SAXException {
        // TODO Auto-generated method stub
        System.out.println("(startDocument)document is startting");
    }

    /**
     * 接收元素开始的通知。
     * 参数意义如下:
     *    uri :元素的命名空间
     *    localName :元素的本地名称(不带前缀)
     *    qName :元素的限定名(带前缀)
     *    atts :元素的属性集合
     */
    @Override
    public void startElement(String uri, String localName, String qName,
            Attributes attributes) throws SAXException {
        // TODO Auto-generated method stub
         System.out.println("(startElement)start element : "+qName+"("+uri+")");
    }

    /**
     * 接收注释声明事件的通知。
     * 参数意义如下:
     *     name - 注释名称。
     *     publicId - 注释的公共标识符,如果未提供,则为 null。
     *     systemId - 注释的系统标识符,如果未提供,则为 null。
     */
    @Override
    public void notationDecl(String name, String publicId, String systemId)
            throws SAXException {
        // TODO Auto-generated method stub
        System.out.println("(notationDecl)notation declare : (name = "+name
                +",systemId = "+publicId
                +",publicId = "+systemId+")");
    }

    /**
     * 允许应用程序解析外部实体。
     * 解析器将在打开任何外部实体(顶级文档实体除外)前调用此方法
     * 参数意义如下:
     *     publicId : 被引用的外部实体的公共标识符,如果未提供,则为 null。
     *     systemId : 被引用的外部实体的系统标识符。
     * 返回:
     *     一个描述新输入源的 InputSource 对象,或者返回 null,
     *     以请求解析器打开到系统标识符的常规 URI 连接。
     */
    @Override
    public InputSource resolveEntity(String publicId, String systemId)
            throws IOException, SAXException {
        // TODO Auto-generated method stub
        return super.resolveEntity(publicId, systemId);
    }

    /**
     * 接收跳过的实体的通知。
     * 参数意义如下:
     * name : 所跳过的实体的名称。如果它是参数实体,则名称将以 ‘%‘ 开头,
     *            如果它是外部 DTD 子集,则将是字符串 "[dtd]"
     */
    @Override
    public void skippedEntity(String name) throws SAXException {
        // TODO Auto-generated method stub
        System.out.println("(skippedEntity)the name of the skipped entity : "+name);
    }

    /**
     * 接收未解析的实体声明事件的通知。
     * 参数意义如下:
     *     name - 未解析的实体的名称。
     *     publicId - 实体的公共标识符,如果未提供,则为 null。
     *     systemId - 实体的系统标识符。
     *     notationName - 相关注释的名称。
     */
    @Override
    public void unparsedEntityDecl(String name, String publicId,
            String systemId, String notationName) throws SAXException {
        // TODO Auto-generated method stub
          System.out.println("(unparsedEntityDecl)unparsed entity declare : (name = "+name
                    +",systemId = "+publicId
                    +",publicId = "+systemId
                    +",notationName = "+notationName+")");
    }

    /**
     * 接收处理指令的通知。
     * 参数意义如下:
     *     target : 处理指令目标
     *     data : 处理指令数据,如果未提供,则为 null。
     */
    @Override
    public void processingInstruction(String target, String data)
            throws SAXException {
        // TODO Auto-generated method stub
         System.out.println("(processingInstruction)process instruction : (target = \""
                    +target+"\",data = \""+data+"\")");
    }

    /**
     * 接收字符数据的通知。
     * 在DOM中 ch[begin:end] 相当于Text节点的节点值(nodeValue)
     */
    @Override
    public void characters(char[] ch, int start, int length)
            throws SAXException {
        // TODO Auto-generated method stub
          StringBuffer buffer = new StringBuffer();
            for(int i = start ; i < start+length ; i++){
                switch(ch[i]){
                    case ‘\\‘:buffer.append("\\\\");break;
                    case ‘\r‘:buffer.append("\\r");break;
                    case ‘\n‘:buffer.append("\\n");break;
                    case ‘\t‘:buffer.append("\\t");break;
                    case ‘\"‘:buffer.append("\\\"");break;
                    default : buffer.append(ch[i]);
                }
            }
            System.out.println("(characters)characters("+length+"): "+buffer.toString());
    }
    /**
     * 错误异常处理 可恢复
     */
    @Override
    public void error(SAXParseException e) throws SAXException {
        // TODO Auto-generated method stub
         System.err.println("(error)Error ("+e.getLineNumber()+","
                    +e.getColumnNumber()+") : "+e.getMessage());
    }

    /**
     * 致命性错误处理 不可恢复
     */
    @Override
    public void fatalError(SAXParseException e) throws SAXException {
        // TODO Auto-generated method stub
         System.err.println("(fatalError)FatalError ("+e.getLineNumber()+","
                    +e.getColumnNumber()+") : "+e.getMessage());
    }

    /**
     * 警告处理
     */
    @Override
    public void warning(SAXParseException e) throws SAXException {
        // TODO Auto-generated method stub
         System.err.println("(warning)("+e.getLineNumber()+","
                    +e.getColumnNumber()+") : "+e.getMessage());
    }
}

 

解析开始:

SAXParse.java

import java.io.File;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.IOException;

import javax.xml.parsers.ParserConfigurationException;
import javax.xml.parsers.SAXParser;
import javax.xml.parsers.SAXParserFactory;

import org.xml.sax.InputSource;
import org.xml.sax.SAXException;
import org.xml.sax.XMLReader;

/**
 * 1.得到SAX解析器的工厂实例
 * 2.从SAX工厂实例中获得SAX解析器
 * 3.把要解析的XML文档转化为输入流,以便DOM解析器解析它
 * 4.解析XML文档
 */
public class SAXParse {

    /**
     * @param args
     */
    public static void main(String[] args) {
        // TODO Auto-generated method stub
        // 得到SAX解析工厂
        SAXParserFactory factory = SAXParserFactory.newInstance();
        // 创建解析器
        SAXParser parser =null;
        try {
            parser = factory.newSAXParser();
            XMLReader xmlReader = parser.getXMLReader();
            InputSource input = new InputSource(new FileInputStream(new File("world.xml")));
            xmlReader.setContentHandler(new MyHandler());
            xmlReader.parse(input);
        } catch (ParserConfigurationException | SAXException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        }catch (FileNotFoundException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        } catch (IOException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        }  

    }

}

5.结果输出;

(setDocumentLocator)set document_locator : (lineNumber = 1,columnNumber = 1,systemId = null,publicId = null)
(startDocument)document is startting
(startElement)start element : world()
(characters)characters(2): \n\t
(startElement)start element : comuntry()
(characters)characters(3): \n\t\t
(startElement)start element : name()
(characters)characters(5): China
(endElement)end element : name()
(characters)characters(3): \n\t\t
(startElement)start element : capital()
(characters)characters(7): Beijing
(endElement)end element : capital()
(characters)characters(3): \n\t\t
(startElement)start element : population()
(characters)characters(4): 1234
(endElement)end element : population()
(characters)characters(3): \n\t\t
(startElement)start element : area()
(characters)characters(3): 960
(endElement)end element : area()
(characters)characters(2): \n\t
(endElement)end element : comuntry()
(characters)characters(2): \n\t
(startElement)start element : comuntry()
(characters)characters(3): \n\t\t
(startElement)start element : name()
(characters)characters(7): America
(endElement)end element : name()
(characters)characters(3): \n\t\t
(startElement)start element : capital()
(characters)characters(10): Washington
(endElement)end element : capital()
(characters)characters(3): \n\t\t
(startElement)start element : population()
(characters)characters(3): 234
(endElement)end element : population()
(characters)characters(3): \n\t\t
(startElement)start element : area()
(characters)characters(3): 900
(endElement)end element : area()
(characters)characters(2): \n\t
(endElement)end element : comuntry()
(characters)characters(2): \n\t
(startElement)start element : comuntry()
(characters)characters(3): \n\t\t
(startElement)start element : name()
(characters)characters(5): Japan
(endElement)end element : name()
(characters)characters(3): \n\t\t
(startElement)start element : capital()
(characters)characters(5): Tokyo
(endElement)end element : capital()
(characters)characters(3): \n\t\t
(startElement)start element : population()
(characters)characters(3): 234
(endElement)end element : population()
(characters)characters(3): \n\t\t
(startElement)start element : area()
(characters)characters(2): 60
(endElement)end element : area()
(characters)characters(2): \n\t
(endElement)end element : comuntry()
(characters)characters(2): \n\t
(startElement)start element : comuntry()
(characters)characters(3): \n\t\t
(startElement)start element : name()
(characters)characters(6): Russia
(endElement)end element : name()
(characters)characters(3): \n\t\t
(startElement)start element : capital()
(characters)characters(6): Moscow
(endElement)end element : capital()
(characters)characters(3): \n\t\t
(startElement)start element : population()
(characters)characters(2): 34
(endElement)end element : population()
(characters)characters(3): \n\t\t
(startElement)start element : area()
(characters)characters(4): 1960
(endElement)end element : area()
(characters)characters(2): \n\t
(endElement)end element : comuntry()
(characters)characters(1): \n
(endElement)end element : world()
(endDocument)doument is ended

 

6.SAX解析完成,这是一个很简单的解析读取过程,具体的应用需要定制。

时间: 2024-11-05 11:34:34

XML文件解析之SAX解析的相关文章

XML 解析---dom解析和sax解析

目前XML解析的方法主要用两种: 1.dom解析:(Document Object Model,即文档对象模型)是W3C组织推荐的解析XML的一种方式. 使用dom解析XML文档,该解析器会先把XML文档加载到内存中,生成该XML文档对应的document对象,然后把XML文档中的各个标签元素变成相应的Element对象,文本会变成Text对象,属性会变成Attribute对象,并按这些标签.文本.属性在XML文档中的关系保存这些对象的关系. 缺点:消耗内存,所以使用dom解析XML文档时不能解

【Android进阶】解析XML文件之使用DOM解析器

在前面的文章中,介绍了使用SAX解析器对XML文件进行解析,SAX解析器的优点就是占用内存小.这篇文章主要介绍使用DOM解析器对XML文件进行解析.DOM解析器的优点可能是理解起来比较的直观,当然,每个人对不同的解析方法可能有不同的喜好.但是DOM解析器有个比较大的缺点,就是占用内存比较多,在Android中的XML解析中,还是更推荐其他的解析方式. 下面介绍使用DOM解析器对XML进行解析. 下面是我们需要解析的xml文件 <?xml version="1.0" encodin

解析XML文件之使用SAM解析器

XML是一种常见的传输数据方式,所以在开发中,我们会遇到对XML文件进行解析的时候,本篇主要介绍使用SAM解析器,对XML文件进行解析. SAX解析器的长处是显而易见的,那就是SAX并不须要将全部的文档都载入内存之后才进行解析.SAX是事件驱动机制的,也就是碰到元素节点.文本节点.文档节点的时候,都会触发一定的事件.我们仅仅须要在对应的回调事件里面进行对应的处理就能够了.由于这个特点,所以SAX解析占用的内存比較少.其它的解析方式,比方下一节要介绍的DOM解析器,则占用内存比較多.在解析比較小的

非常简单的XML解析(SAX解析、pull解析)

这里只是把解析的数据当日志打出来了 非常简单的xml解析方式 1 package com.example.demo.service; 2 3 import java.io.IOException; 4 import java.io.InputStream; 5 6 import javax.xml.parsers.DocumentBuilder; 7 import javax.xml.parsers.DocumentBuilderFactory; 8 import javax.xml.parse

javaweb学习总结十二(JAXP对XML文档进行SAX解析)

一:JAXP使用SAX方式解析XML文件 1:dom解析与sax解析异同点 2:sax解析特点 二:代码案例 1:xml文件 1 <?xml version="1.0" encoding="UTF-8" standalone="no"?> 2 <students> 3 <student> 4 <name sid="111">李四</name> 5 <age>

XML解析之SAX解析技术案例

Java代码: package com.xushouwei.xml; import java.io.File; import java.io.IOException; import java.text.DateFormat; import java.text.SimpleDateFormat; import javax.xml.parsers.ParserConfigurationException; import javax.xml.parsers.SAXParser; import java

XML解析(二) SAX解析

XML解析之SAX解析: SAX解析器:SAXParser类同DOM一样也在javax.xml.parsers包下,此类的实例可以从 SAXParserFactory.newSAXParser() 方法获得. 注意SAXParser的parse()方法: parse(String uri, DefaultHandler dh),parse(File f, DefaultHandler dh)等都需要传递一个DefaultHandler的对象. 查看API帮助手册可知,SAX解析是事件驱动的,De

JavaWeb-05 XML基础(Dom解析和Sax解析)

JavaWeb-05 JavaWeb-XML基础(Dom解析和Sax解析) 一.XML的概述(了解) eXtensible Markup Language 可扩展标记语言 宗旨是传输数据,而非显示数据. XML标签没有被预定义,需要用户自行定义标签. XML技术是W3C组织(WorldWideWeConsortium万维网联盟)发布的,目前遵循的是W3C组织于2000年发布的XML1.0规范. 作用: a. 传输数据 b. 配置文件(主要用途) XML技术用于解决什么问题? a. XML语言出现

2018/1/1 XML和DOM、SAX解析

1.XML (1)描述带关系的数据(软件的配置文件) (2)数据的载体(小型的"数据库")2.语法:标签: 标签名不能以数字开头,中间不能有空格,区分大小写.有且仅有一个根标签.属性: 可有多个属性,但属性值必须用引号(单引号或双引号)包含,但不能省略,也不能单双混用.文档声明: <?xml version="1.0" encoding="utf-8"?> encoding="utf-8": 打开或解析xml文档时

java解析XML① 之DOM解析和SAX解析(包含CDATA的问题)

Dom解析功能强大,可增删改查,操作时会将XML文档读到内存,因此适用于小文档: SAX解析是从头到尾逐行逐个元素解析,修改较为不便,但适用于只读的大文档:SAX采用事件驱动的方式解析XML.如同在电影院看电影一样,从头到尾看一遍,不能回退(Dom可来来回回读取),在看电影的过程中,每遇到一个情节,都会调用大脑去接收处理这些信息.SAX也是相同的原理,每遇到一个元素节点,都会调用相应的方法来处理.在SAX的解析过程中,读取到文档开头.文档结尾,元素的开头和元素结尾都会调用相应方法,我们可以在这些