How to Validate XML using Java

Configure Java APIs (SAX, DOM, dom4j, XOM) using JAXP 1.3 to validate XML Documents with DTD and Schema(s).

Many Java XML APIs provide mechanisms to validate XML documents, the JAXP API can be used for most of these XML APIs but subtle configuration differences exists. This article shows five ways of how to configure different Java APIs (including DOM, SAX, dom4j and XOM) using JAXP 1.3 for checking and validating XML with DTD and Schema(s).

Contents

Setup

All underlying examples can be compiled and executed using Java 5.0 (JAXP 1.3) or higher and make use of the following components and settings.

Error Handler

To report errors, it is necessary to provide an ErrorHandler to the underlying implementation. The ErrorHandler used for the examples is a very simple one which reports the error to System.out and continues until the XML document has been fully parsed or until a fatal-error has been reported.

public class SimpleErrorHandler implements ErrorHandler {
    public void warning(SAXParseException e) throws SAXException {
        System.out.println(e.getMessage());
    }

    public void error(SAXParseException e) throws SAXException {
        System.out.println(e.getMessage());
    }

    public void fatalError(SAXParseException e) throws SAXException {
        System.out.println(e.getMessage());
    }
}

Namespace Aware

Namespaces have been introduced to XML after the first specification of XML had received the official W3C Recommendation status. This is the reason why (most of the) XML parser implementations do not support XML Namespaces by default, to handle the validation of XML documents with namespaces correctly it is therefore necessary to configure the underlying parsers to provide support for XML Namespaces.

Input Document

The input document ("contacts.xml") that has been used for all the code examples is shown below.

<!DOCTYPE contacts SYSTEM "contacts.dtd">

<contacts xsi:noNamespaceSchemaLocation="contacts.xsd"
  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
  <contact title="citizen">
    <firstname>Edwin</firstname>
    <lastname>Dankert</lastname>
  </contact>
</contacts>

XML Schema

The XML Schema ("contacts.xsd") as defined below has been used in the code examples to validate the input document. The input document contains an extra attribute which has not been defined in the XML Schema, this shows that the XML Schema has been used for the validation. When using this XML Schema to validate the input XML document, the following error gets reported:

cvc-complex-type.3.2.2: Attribute ‘title‘ is not allowed to appear in element ‘contact‘.

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
  <xs:element name="contacts">
    <xs:complexType>
      <xs:sequence>
        <xs:element ref="contact"/>
      </xs:sequence>
    </xs:complexType>
  </xs:element>

  <xs:element name="contact">
    <xs:complexType>
      <xs:sequence>
        <xs:element name="firstname" type="xs:NCName"/>
        <xs:element name="lastname" type="xs:NCName"/>
      </xs:sequence>
    </xs:complexType>
  </xs:element>
</xs:schema>

DTD

The DTD ("contacts.dtd") as defined below has been used in the code examples to validate the input document. To highlight that the DTD has been used for the validation, the title attribute in the input document has a value which is not allowed according to this DTD. When using this DTD to validate the input XML document, the following error gets reported:

Attribute "title" with value "citizen" must have a value from the list "MR MS MRS ".

<!ELEMENT contacts (contact*)>
<!ATTLIST contacts xsi:noNamespaceSchemaLocation CDATA #IMPLIED>
<!ATTLIST contacts xmlns:xsi CDATA #IMPLIED>

<!ELEMENT contact (firstname,lastname)>
<!ATTLIST contact title (MR|MS|MRS) "MS">

<!ELEMENT firstname (#PCDATA)>
<!ELEMENT lastname (#PCDATA)>

Checking Wellformed-ness

Before a document can be called XML and not csv, simple text or any other format, it needs to support the basic rules as defined by the XML Recommendation, when it adheres to these rules it is said to be Wellformed XML.

Code Fragments: DOMSAXdom4jXOM

DOM

DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
factory.setValidating(false);
factory.setNamespaceAware(true);

DocumentBuilder builder = factory.newDocumentBuilder();

builder.setErrorHandler(new SimpleErrorHandler());

Document document = builder.parse(new InputSource("document.xml"));

SAX

SAXParserFactory factory = SAXParserFactory.newInstance();
factory.setValidating(false);
factory.setNamespaceAware(true);

SAXParser parser = factory.newSAXParser();

XMLReader reader = parser.getXMLReader();
reader.setErrorHandler(new SimpleErrorHandler());
reader.parse(new InputSource("document.xml"));

dom4j

SAXReader reader = new SAXReader();
reader.setValidation(false);
reader.setErrorHandler(new SimpleErrorHandler());
reader.read("contacts.xml");

XOM

SAXParserFactory factory = SAXParserFactory.newInstance();
factory.setValidating(false);
factory.setNamespaceAware(true);

SAXParser parser = factory.newSAXParser();

XMLReader reader = parser.getXMLReader();
reader.setErrorHandler(new SimpleErrorHandler());

Builder builder = new Builder(reader);
builder.build("contacts.xml");

Validate using internal DTD

Parse the input document using only the DTD (contacts.dtd), as defined by the DOCTYPE in the input document, for validation.

Code Fragments: DOMSAXdom4jXOM

DOM

DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
factory.setValidating(true);
factory.setNamespaceAware(true);

DocumentBuilder builder = factory.newDocumentBuilder();

builder.setErrorHandler(new SimpleErrorHandler());

Document document = builder.parse(new InputSource("document.xml"));

SAX

SAXParserFactory factory = SAXParserFactory.newInstance();
factory.setValidating(true);
factory.setNamespaceAware(true);

SAXParser parser = factory.newSAXParser();

XMLReader reader = parser.getXMLReader();
reader.setErrorHandler(new SimpleErrorHandler());
reader.parse(new InputSource("document.xml"));

dom4j

SAXReader reader = new SAXReader();
reader.setValidation(true);
reader.setErrorHandler(new SimpleErrorHandler());
reader.read("contacts.xml");

XOM

SAXParserFactory factory = SAXParserFactory.newInstance();
factory.setValidating(true);
factory.setNamespaceAware(true);

SAXParser parser = factory.newSAXParser();

XMLReader reader = parser.getXMLReader();
reader.setErrorHandler(new SimpleErrorHandler());

Builder builder = new Builder(reader);
builder.build("contacts.xml");

Validate using internal XSD

Parse the input document using only the XML Schema (contacts.xsd), as defined by the noNamespaceSchemaLocation attribute in the input document, for validation.

Code Fragments: DOMSAXdom4jXOM

DOM

DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
factory.setValidating(true);
factory.setNamespaceAware(true);
factory.setAttribute("http://java.sun.com/xml/jaxp/properties/schemaLanguage",
      "http://www.w3.org/2001/XMLSchema");

DocumentBuilder builder = factory.newDocumentBuilder();

builder.setErrorHandler(new SimpleErrorHandler());

Document document = builder.parse(new InputSource("document.xml"));

SAX

SAXParserFactory factory = SAXParserFactory.newInstance();
factory.setValidating(true);
factory.setNamespaceAware(true);

SAXParser parser = factory.newSAXParser();
parser.setProperty("http://java.sun.com/xml/jaxp/properties/schemaLanguage",
      "http://www.w3.org/2001/XMLSchema");

XMLReader reader = parser.getXMLReader();
reader.setErrorHandler(new SimpleErrorHandler());
reader.parse(new InputSource("document.xml"));

dom4j

SAXParserFactory factory = SAXParserFactory.newInstance();
factory.setValidating(true);

SAXParser parser = factory.newSAXParser();
parser.setProperty("http://java.sun.com/xml/jaxp/properties/schemaLanguage",
      "http://www.w3.org/2001/XMLSchema");

SAXReader reader = new SAXReader(parser.getXMLReader());
reader.setValidation(true);
reader.setErrorHandler(new SimpleErrorHandler());
reader.read("contacts.xml");

XOM

SAXParserFactory factory = SAXParserFactory.newInstance();
factory.setValidating(true);
factory.setNamespaceAware(true);

SAXParser parser = factory.newSAXParser();
parser.setProperty("http://java.sun.com/xml/jaxp/properties/schemaLanguage",
      "http://www.w3.org/2001/XMLSchema");

XMLReader reader = parser.getXMLReader();
reader.setErrorHandler(new SimpleErrorHandler());

Builder builder = new Builder(reader);
builder.build("contacts.xml");

Validate using external Schema

Parse the input document using the schema (contacts.xsd), as defined externally by the source-code.

Code Fragments: DOMSAXdom4jXOM

DOM

DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
factory.setValidating(false);
factory.setNamespaceAware(true);

SchemaFactory schemaFactory =
    SchemaFactory.newInstance("http://www.w3.org/2001/XMLSchema");

factory.setSchema(schemaFactory.newSchema(
    new Source[] {new StreamSource("contacts.xsd")}));

DocumentBuilder builder = factory.newDocumentBuilder();

builder.setErrorHandler(new SimpleErrorHandler());

Document document = builder.parse(new InputSource("document.xml"));

SAX

SAXParserFactory factory = SAXParserFactory.newInstance();
factory.setValidating(false);
factory.setNamespaceAware(true);

SchemaFactory schemaFactory =
    SchemaFactory.newInstance("http://www.w3.org/2001/XMLSchema");

factory.setSchema(schemaFactory.newSchema(
    new Source[] {new StreamSource("contacts.xsd")}));

SAXParser parser = factory.newSAXParser();

XMLReader reader = parser.getXMLReader();
reader.setErrorHandler(new SimpleErrorHandler());
reader.parse(new InputSource("document.xml"));

dom4j

SAXParserFactory factory = SAXParserFactory.newInstance();

SchemaFactory schemaFactory =
    SchemaFactory.newInstance("http://www.w3.org/2001/XMLSchema");

factory.setSchema(schemaFactory.newSchema(
    new Source[] {new StreamSource("contacts.xsd")}));

SAXParser parser = factory.newSAXParser();

SAXReader reader = new SAXReader(parser.getXMLReader());
reader.setValidation(false);
reader.setErrorHandler(new SimpleErrorHandler());
reader.read("contacts.xml");

XOM

SAXParserFactory factory = SAXParserFactory.newInstance();
factory.setValidating(false);
factory.setNamespaceAware(true);

SchemaFactory schemaFactory =
    SchemaFactory.newInstance("http://www.w3.org/2001/XMLSchema");
factory.setSchema(schemaFactory.newSchema(
    new Source[] {new StreamSource("contacts.xsd")}));

SAXParser parser = factory.newSAXParser();
XMLReader reader = parser.getXMLReader();
reader.setErrorHandler(new SimpleErrorHandler());

Builder builder = new Builder(reader);
builder.build("contacts.xml");

Validate using internal DTD and external Schema

Parse the input document using the schema (contacts.xsd), as defined externally by the source-code and the DTD (contacts.dtd), as defined by the DOCTYPE in the input document, for validation.

Code Fragments: DOMSAXdom4jXOM

DOM

DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
factory.setValidating(true);
factory.setNamespaceAware(true);

SchemaFactory schemaFactory =
    SchemaFactory.newInstance("http://www.w3.org/2001/XMLSchema");

factory.setSchema(schemaFactory.newSchema(
    new Source[] {new StreamSource("contacts.xsd")}));

DocumentBuilder builder = factory.newDocumentBuilder();

builder.setErrorHandler(new SimpleErrorHandler());

Document document = builder.parse(new InputSource("document.xml"));

SAX

SAXParserFactory factory = SAXParserFactory.newInstance();
factory.setValidating(true);
factory.setNamespaceAware(true);

SchemaFactory schemaFactory =
    SchemaFactory.newInstance("http://www.w3.org/2001/XMLSchema");

factory.setSchema(schemaFactory.newSchema(
    new Source[] {new StreamSource("contacts.xsd")}));

SAXParser parser = factory.newSAXParser();

XMLReader reader = parser.getXMLReader();
reader.setErrorHandler(new SimpleErrorHandler());
reader.parse(new InputSource("document.xml"));

dom4j

SAXParserFactory factory = SAXParserFactory.newInstance();

SchemaFactory schemaFactory =
    SchemaFactory.newInstance("http://www.w3.org/2001/XMLSchema");

factory.setSchema(schemaFactory.newSchema(
    new Source[] {new StreamSource("contacts.xsd")}));

SAXParser parser = factory.newSAXParser();

SAXReader reader = new SAXReader(parser.getXMLReader());
reader.setValidation(true);
reader.setErrorHandler(new SimpleErrorHandler());
reader.read("contacts.xml");

XOM

SAXParserFactory factory = SAXParserFactory.newInstance();
factory.setValidating(true);
factory.setNamespaceAware(true);

SchemaFactory schemaFactory =
    SchemaFactory.newInstance("http://www.w3.org/2001/XMLSchema");
factory.setSchema(schemaFactory.newSchema(
    new Source[] {new StreamSource("contacts.xsd")}));

SAXParser parser = factory.newSAXParser();
XMLReader reader = parser.getXMLReader();
reader.setErrorHandler(new SimpleErrorHandler());

Builder builder = new Builder(reader);
builder.build("contacts.xml");

Conclusion

Several mechanisms and XML APIs can be used to parse and validate XML, by using JAXP 1.3 the mechanism can mostly stay the same for these different APIs.

Sample Code

Download any of the archives to get the full source-code for the examples above.

The archives consist of the ./contacts.xml input XML file, ./contacts.xsd the XML Schema document, ./contacts.dtd the DTD document and the source-code for the fragments above, located in the ./src directory.

The archive also contains a number XML Hammer validation projects included in the ./xmlhammer-projects directory. To be able to execute these XML Hammer projects, you will need to have the XML Hammer application installed. This can be downloaded from:

http://www.xmlhammer.org/downloads.html.

Resources

时间: 2024-10-12 13:48:48

How to Validate XML using Java的相关文章

Java与XML的故事二:XML与Java Object互相转换

XML文件和Java对象转换是一件非常简单的事情,有了annotation的java文件和XML schema XSD文件,可以简单的通过JAXB API来实现XML与Java Object转换 marshaller Java to XML Exception is not display here prviate static javax.xml.bind.JAXBContext jaxbCtx = null; private static Schema schema = null; stat

【Android XML】Android XML 转 Java Code 系列

最近在公司做一个项目,需要把Android界面打包进jar包给客户使用.对绝大部分开发者来说,Android界面的布局以XML文件为主,并辅以少量Java代码进行动态调整.而打包进jar包的代码,意味着无法通过常规的getResources(),getString()等方法来快速的获取资源,因为这些资源都是在apk安装的时候初始化生成的.为了满足客户的需求,笔者开始在网上寻找各种解决方案.结果如下: 1.apk 主体包方案 实现方法:安装一个新的apk,新apk和主apk使用android:sh

xml 解析 java 基础复习

document  解析 sax  解析 dom4j 解析(摘自csdn redarmychen) dom4j是一个Java的XML API,类似于jdom,用来读写XML文件的.dom4j是一个非常非常优秀的Java XML API,具有性能优异.功能强大和极端易用使用的特点,同时它也是一个开放源代码的软件,可以在SourceForge上找到它. 对主流的Java XML API进行的性能.功能和易用性的评测,dom4j无论在那个方面都是非常出色的.如今你可以看到越来越多的Java软件都在使用

有没有最简单的xml转换java类的方法

原文:有没有最简单的xml转换java类的方法 代码下载地址:http://www.zuidaima.com/share/1550463237098496.htm 手动编码太烦人了,每次新增一个java entity类都需要写一个解析器. 有没有最简单的xml转换java类的方法,布布扣,bubuko.com

Java&amp;amp;Xml教程(十一)JAXB实现XML与Java对象转换

JAXB是Java Architecture for XML Binding的缩写,用于在Java类与XML之间建立映射,可以帮助开发人员非常方便的將XML和Java对象进行相互转换. 本文以一个简单的样例介绍JAXB的使用.首先我们须要了解一下JAXB经常使用的API. JAXBContext类.是应用的入口.用于管理XML/Java绑定信息. Marshaller接口.将Java对象序列化为XML数据. Unmarshaller接口,将XML数据反序列化为Java对象. @XmlType,将

JAXB完成XML与Java对象的互转

这段时间都老忙了,甚至连周末所有人员都在赶产品的进度,想想连续上12天班,人都有点晕了!到这会儿终于有点时间,所以准备和大家分享一下JAXB,会不会有人觉得有点陌生呢?没事,这里跟大伙儿简单的描述一下: JAXB(Java Architecture for XML Binding) 是一个业界的标准,是一项可以根据XML Schema产生Java类的技术.该过程中,JAXB也提供了将XML实例文档反向生成Java对象树的方法,并能将Java对象树的内容重新写到XML实例文档.从另一方面来讲,JA

XML在java或.NET中转为Json的数组或对象数据时的处理

XML在java和.NET中转为Json数据时会出现这样一个问题,当节点中只有一个节点数据时是转换为对象,有多个节点则是转为数组,为了应对这个问题我们需要在解析的时候进行如下的处理,不然就会报异常: 数据1:数组格式 "Field": [                    {                        "@name": "APPROVE",                        "@title"

java实现Spring在XML配置java类

1.创建自己的bean文件:beans.xml <?xml version="1.0" encoding="UTF-8"?> <busi-beans> <beans> <bean id="SysHelloImpl" type="com.cxm.test.SysHello"> <desc>test</desc> <impl-class>com.c

XStream 用法详解 XML 转换为 java 实体类

XStream 用法详解 java 类与 XML 互换 现在 WEB数据交换的时代,传送XML目前是一个比较流行的方式,具有统一的规则约束,为实现后台接口提供了一个很方便的实现. 我编写了一个 接收XML并转换成所需要的Object类的 小例子,希望能够对做互联网数据传输.接口调用的朋友有所帮助. 首先要导入jar包xstream-1.4.3-sources.jar 和 xmlpull-1.1.3.1.jar 两个包: 其次是预备一个 XML 事例 <config> <span styl