One of the problems that every software developer meets from time to time is the validation of some XML text against a schema. I am talking about XML schema, not the less strict document type definitions. There are different techniques to do a programmed validation, and I want to summarize my Java experiences in this Blog.
Internally Given Schema
The most frequent case is that the XML text you want to validate contains a reference to an XML schema.
Example
<?xml version="1.0" encoding="UTF-8"?> <example xmlns="http://www.example.org" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.example.org http://www.example.org/example.xsd"> <title>....</title> <summary>....</summary> <content>....</content> </example>
How to read this? The root element contains three attributes, where ...
-
xmlns = "http://www.example.org"
defineshttp://www.example.org
as the identifier (not location!) for the default namespace of the XML-document, i.e. all elements that do not explicitly declare a namespace (namespace:element
) belong to that space, for exampletitle
.
Note: the default namespace can be left out when using thenoNamespaceSchemaLocation
attribute, see example on bottom of this page. -
xmlns:xsi = "http://www.w3.org/2001/XMLSchema-instance"
declares a constant identifier (not location!) for the reserved namespacexsi
, needed to use the attributexsi:schemaLocation
-
xsi:schemaLocation = "http://www.example.org http://www.example.org/example.xsd"
finally uses an attribute from namespacexsi
to declare a concrete schema for the default namespace identifierhttp://www.example.org
(first part in attribute value), and it referenceshttp://www.example.org/example.xsd
(second part in attribute value, separated by space). Mind that there can be several namespace - location pairs in this attribute value!
So the schema for this XML is available on http://www.example.org/example.xsd
.
Loading this URI
in a web browser should display the contents of the XML schema.
All of the elements example, title, summary, content
must be described there.
Validation
Following shows a way how to validate this XML using the programming language Java.
First we need a SAX parsing-handler that receives errors and warnings. Conveniently we also want to receive line numbers for the messages.
public class XmlValidationResult extends DefaultHandler { public final List<String> warnings = new ArrayList<String>(); public final List<String> errors = new ArrayList<String>(); private Locator locator; /** * Called by the SAXParser before any other method. * @param locator the parser's locator object where you can get line numbers from. */ @Override public void setDocumentLocator(Locator locator) { this.locator = locator; } @Override public void warning(SAXParseException ex) throws SAXException { warnings.add(lineNumber()+ex.getMessage()); } @Override public void error(SAXParseException ex) throws SAXException { errors.add(lineNumber()+ex.getMessage()); } @Override public void fatalError(SAXParseException ex) throws SAXException { errors.add(lineNumber()+ex.getMessage()); } private String lineNumber() { return "Exception during validation" +((locator != null) ? " at line "+locator.getLineNumber() : "") +": "; } }
Using this handler we now can check the XML for validity.
public static XmlValidationResult validateXml(byte [] documentBytes) { final InputSource saxSource = new InputSource(new ByteArrayInputStream(documentBytes)); final SAXParserFactory factory = SAXParserFactory.newInstance(); factory.setNamespaceAware(true); factory.setValidating(true); final XmlValidationResult errorHandler = new XmlValidationResult(); try { final SAXParser parser = factory.newSAXParser(); parser.setProperty("http://java.sun.com/xml/jaxp/properties/schemaLanguage", XMLConstants.W3C_XML_SCHEMA_NS_URI); parser.parse(saxSource, errorHandler); } catch (ParserConfigurationException | SAXException | IOException e) { errorHandler.errors.add("Unexpected parsing error: "+e.getMessage()); } return errorHandler; }
For documentation about the used classes please read their
JavaDoc.
Unfortunately there isn't a String-constant for "http://java.sun.com/xml/jaxp/properties/schemaLanguage"
anywhere, but it is one.
Externally Given Schema
Example
<?xml version="1.0" encoding="UTF-8"?> <example> <title>....</title> <summary>....</summary> <content>....</content> </example>
So here we have some XML that does not declare its schema, and we want to know if it conforms to
http://www.example.org/example.xsd
.
Validation
Following source would validate this XML in case the schema is passed as Source parameter.
public static XmlValidationResult validateAgainstSchema(Source schemaSource, byte [] documentBytes) { final SchemaFactory schemaFactory = SchemaFactory.newInstance(XMLConstants.W3C_XML_SCHEMA_NS_URI); try { final Schema schema = schemaFactory.newSchema(schemaSource); final Validator validator = schema.newValidator(); final XmlValidationResult errorHandler = new XmlValidationResult(); validator.setErrorHandler(errorHandler); validator.validate(new StreamSource(new ByteArrayInputStream(documentBytes))); return errorHandler; } catch (Exception e) { throw new RuntimeException("Unexpected validation error: "+e.getMessage()); } }
This implementation uses the javax.xml
API
introduced in Java 1.5.
Schema Located in CLASSPATH
The preferred way to drive validation surely is the one with internally given schema, because this gives the user the chance to alter the schema after deployment of the application. Else the application would have to maintain a compiled mapping of XML files to schemas.
A special problem with internally given validation is when you have schema files packed into an application.jar file. Imagine the case a user edits some XML, and the application has to validate that XML against one of these schemas. The user names the schema as relative or absolute path, instead through an http-URI.
Example
<?xml version="1.0" encoding="UTF-8"?> <addresses xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation='/absolute/path/in/jar/test.xsd'> <address> <name>Joe Tester</name> <street>Baker street 5</street> </address> </addresses>
This is the simplest way to give XML a schema. The noNamespaceSchemaLocation
attribute
can contain just one schema location, no id - location pairs like schemaLocation
.
Validation
The XML parser will not be able to locate this schema reference. You will get a message like
cvc-elt.1: Cannot find the declaration of element ....
But you can tell the validator how to load the schema via the org.w3c.dom.ls API (ls = Load and Save).
public static XmlValidationResult validateAgainstSchemaInClasspath(byte [] documentBytes) { final SchemaFactory factory = SchemaFactory.newInstance(XMLConstants.W3C_XML_SCHEMA_NS_URI); try { final Schema schema = factory.newSchema(); final Validator validator = schema.newValidator(); final DOMImplementationRegistry registry = DOMImplementationRegistry.newInstance(); validator.setResourceResolver(new LSResourceResolver() { @Override public LSInput resolveResource(String type, String namespaceURI, String publicId, String systemId, String baseURI) { final InputStream in = getClass().getResourceAsStream(systemId); final DOMImplementationLS domImplementationLS = (DOMImplementationLS) registry.getDOMImplementation("LS"); final LSInput input = domImplementationLS.createLSInput(); input.setByteStream(in); return input; } }); final XmlValidationResult errorHandler = new XmlValidationResult(); validator.setErrorHandler(errorHandler); validator.validate(new StreamSource(new ByteArrayInputStream(documentBytes))); return errorHandler; } catch (Exception e) { throw new RuntimeException("Unexpected validation error: "+e.getMessage()); } }
What can you do with such a validation?
- Either locate the schema files in an arbitrary path inside the JAR, and refer to them with an absolute path (starting with "/"),
- or put the schema files into the same path as the class that validates, and refer to the schemas with a path relative to the class (without leading "/").
For trying this out, here is the source of the XML schema used in this example.
<xs:schema xmlns:xs='http://www.w3.org/2001/XMLSchema'> <xs:element name="addresses"> <xs:complexType> <xs:sequence> <xs:element ref="address" minOccurs='1' maxOccurs='unbounded' /> </xs:sequence> </xs:complexType> </xs:element> <xs:element name="address"> <xs:complexType> <xs:sequence> <xs:element ref="name" minOccurs='0' maxOccurs='1' /> <xs:element ref="street" minOccurs='0' maxOccurs='1' /> </xs:sequence> </xs:complexType> </xs:element> <xs:element name="name" type='xs:string' /> <xs:element name="street" type='xs:string' /> </xs:schema>