Blog-Archiv

Montag, 29. Mai 2017

XML Schema Validation in Java

One of the problems that every software developer meets from time to time is the validation of some XML text against a schema. I am talking about XML schema, not the less strict document type definitions. There are different techniques to do a programmed validation, and I want to summarize my Java experiences in this Blog.

Internally Given Schema

The most frequent case is that the XML text you want to validate contains a reference to an XML schema.

Example

<?xml version="1.0" encoding="UTF-8"?>
<example
    xmlns="http://www.example.org"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:schemaLocation="http://www.example.org http://www.example.org/example.xsd">
  
  <title>....</title>
  <summary>....</summary>
  <content>....</content>

</example>

How to read this? The root element contains three attributes, where ...

  1. xmlns = "http://www.example.org"
    defines http://www.example.org as the identifier (not location!) for the default namespace of the XML-document, i.e. all elements that do not explicitly declare a namespace (namespace:element) belong to that space, for example title.

    Note: the default namespace can be left out when using the noNamespaceSchemaLocation attribute, see example on bottom of this page.

  2. xmlns:xsi = "http://www.w3.org/2001/XMLSchema-instance"
    declares a constant identifier (not location!) for the reserved namespace xsi, needed to use the attribute xsi:schemaLocation

  3. xsi:schemaLocation = "http://www.example.org http://www.example.org/example.xsd"
    finally uses an attribute from namespace xsi to declare a concrete schema for the default namespace identifier http://www.example.org (first part in attribute value), and it references http://www.example.org/example.xsd (second part in attribute value, separated by space). Mind that there can be several namespace - location pairs in this attribute value!

So the schema for this XML is available on http://www.example.org/example.xsd. Loading this URI in a web browser should display the contents of the XML schema. All of the elements example, title, summary, content must be described there.

Validation

Following shows a way how to validate this XML using the programming language Java.

First we need a SAX parsing-handler that receives errors and warnings. Conveniently we also want to receive line numbers for the messages.

public class XmlValidationResult extends DefaultHandler
{
    public final List<String> warnings = new ArrayList<String>();
    public final List<String> errors = new ArrayList<String>();
    
    private Locator locator;
    
    /**
     * Called by the SAXParser before any other method.
     * @param locator the parser's locator object where you can get line numbers from.
     */
    @Override
    public void setDocumentLocator(Locator locator) {
        this.locator = locator;
    }
    
    @Override
    public void warning(SAXParseException ex) throws SAXException {
        warnings.add(lineNumber()+ex.getMessage());
    }

    @Override
    public void error(SAXParseException ex) throws SAXException {
        errors.add(lineNumber()+ex.getMessage());
    }

    @Override
    public void fatalError(SAXParseException ex) throws SAXException {
        errors.add(lineNumber()+ex.getMessage());
    }

    private String lineNumber() {
        return "Exception during validation"
                +((locator != null) ? " at line "+locator.getLineNumber() : "")
                +": ";
    }
}

Using this handler we now can check the XML for validity.

    public static XmlValidationResult validateXml(byte [] documentBytes) {
        final InputSource saxSource = new InputSource(new ByteArrayInputStream(documentBytes));
        
        final SAXParserFactory factory = SAXParserFactory.newInstance();
        factory.setNamespaceAware(true);
        factory.setValidating(true);

        final XmlValidationResult errorHandler = new XmlValidationResult();
        try {
            final SAXParser parser = factory.newSAXParser();
            parser.setProperty("http://java.sun.com/xml/jaxp/properties/schemaLanguage", XMLConstants.W3C_XML_SCHEMA_NS_URI); 
            parser.parse(saxSource, errorHandler);
        }
        catch (ParserConfigurationException | SAXException | IOException e) {
            errorHandler.errors.add("Unexpected parsing error: "+e.getMessage());
        }
        
        return errorHandler;
    }

For documentation about the used classes please read their JavaDoc. Unfortunately there isn't a String-constant for "http://java.sun.com/xml/jaxp/properties/schemaLanguage" anywhere, but it is one.

Externally Given Schema

Example

<?xml version="1.0" encoding="UTF-8"?>
<example>
  
  <title>....</title>
  <summary>....</summary>
  <content>....</content>

</example>

So here we have some XML that does not declare its schema, and we want to know if it conforms to http://www.example.org/example.xsd.

Validation

Following source would validate this XML in case the schema is passed as Source parameter.

    public static XmlValidationResult validateAgainstSchema(Source schemaSource, byte [] documentBytes) {
        final SchemaFactory schemaFactory = SchemaFactory.newInstance(XMLConstants.W3C_XML_SCHEMA_NS_URI);
        
        try {
            final Schema schema = schemaFactory.newSchema(schemaSource);
            final Validator validator = schema.newValidator();
            
            final XmlValidationResult errorHandler = new XmlValidationResult();
            validator.setErrorHandler(errorHandler);
            
            validator.validate(new StreamSource(new ByteArrayInputStream(documentBytes)));
            
            return errorHandler;
        }
        catch (Exception e) {
            throw new RuntimeException("Unexpected validation error: "+e.getMessage());
        }
    }

This implementation uses the javax.xml API introduced in Java 1.5.

Schema Located in CLASSPATH

The preferred way to drive validation surely is the one with internally given schema, because this gives the user the chance to alter the schema after deployment of the application. Else the application would have to maintain a compiled mapping of XML files to schemas.

A special problem with internally given validation is when you have schema files packed into an application.jar file. Imagine the case a user edits some XML, and the application has to validate that XML against one of these schemas. The user names the schema as relative or absolute path, instead through an http-URI.

Example

<?xml version="1.0" encoding="UTF-8"?>
<addresses
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:noNamespaceSchemaLocation='/absolute/path/in/jar/test.xsd'>

  <address>
    <name>Joe Tester</name>
    <street>Baker street 5</street>
  </address>
  
</addresses>

This is the simplest way to give XML a schema. The noNamespaceSchemaLocation attribute can contain just one schema location, no id - location pairs like schemaLocation.

Validation

The XML parser will not be able to locate this schema reference. You will get a message like

cvc-elt.1: Cannot find the declaration of element ....

But you can tell the validator how to load the schema via the org.w3c.dom.ls API (ls = Load and Save).

    public static XmlValidationResult validateAgainstSchemaInClasspath(byte [] documentBytes) {
        final SchemaFactory factory = SchemaFactory.newInstance(XMLConstants.W3C_XML_SCHEMA_NS_URI);

        try {
            final Schema schema = factory.newSchema();
            final Validator validator = schema.newValidator();
        
            final DOMImplementationRegistry registry = DOMImplementationRegistry.newInstance();
        
            validator.setResourceResolver(new LSResourceResolver() {
                @Override
                public LSInput resolveResource(String type, String namespaceURI, String publicId, String systemId, String baseURI) {
                    final InputStream in = getClass().getResourceAsStream(systemId);
                    final DOMImplementationLS domImplementationLS = (DOMImplementationLS) registry.getDOMImplementation("LS");
                    final LSInput input = domImplementationLS.createLSInput();
                    input.setByteStream(in);
                    return input;
                }
            });
        
            final XmlValidationResult errorHandler = new XmlValidationResult();
            validator.setErrorHandler(errorHandler);

            validator.validate(new StreamSource(new ByteArrayInputStream(documentBytes)));

            return errorHandler;
        }
        catch (Exception e) {
            throw new RuntimeException("Unexpected validation error: "+e.getMessage());
        }
    }

What can you do with such a validation?

  1. Either locate the schema files in an arbitrary path inside the JAR, and refer to them with an absolute path (starting with "/"),
  2. or put the schema files into the same path as the class that validates, and refer to the schemas with a path relative to the class (without leading "/").

For trying this out, here is the source of the XML schema used in this example.

<xs:schema xmlns:xs='http://www.w3.org/2001/XMLSchema'>

 <xs:element name="addresses">
  <xs:complexType>
   <xs:sequence>
    <xs:element ref="address" minOccurs='1' maxOccurs='unbounded' />
   </xs:sequence>
  </xs:complexType>
 </xs:element>

 <xs:element name="address">
  <xs:complexType>
   <xs:sequence>
    <xs:element ref="name" minOccurs='0' maxOccurs='1' />
    <xs:element ref="street" minOccurs='0' maxOccurs='1' />
   </xs:sequence>
  </xs:complexType>
 </xs:element>

 <xs:element name="name" type='xs:string' />
 <xs:element name="street" type='xs:string' />
 
</xs:schema>



Keine Kommentare: