Blog-Archiv

Montag, 29. Mai 2017

XML Schema Validation in Java

One of the problems that every software developer meets from time to time is the validation of some XML text against a schema. I am talking about XML schema, not the less strict document type definitions. There are different techniques to do a programmed validation, and I want to summarize my Java experiences in this Blog.

Internally Given Schema

The most frequent case is that the XML text you want to validate contains a reference to an XML schema.

Example

<?xml version="1.0" encoding="UTF-8"?>
<example
    xmlns="http://www.example.org"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:schemaLocation="http://www.example.org http://www.example.org/example.xsd">
  
  <title>....</title>
  <summary>....</summary>
  <content>....</content>

</example>

How to read this? The root element contains three attributes, where ...

  1. xmlns = "http://www.example.org"
    defines http://www.example.org as the identifier (not location!) for the default namespace of the XML-document, i.e. all elements that do not explicitly declare a namespace (namespace:element) belong to that space, for example title.

    Note: the default namespace can be left out when using the noNamespaceSchemaLocation attribute, see example on bottom of this page.

  2. xmlns:xsi = "http://www.w3.org/2001/XMLSchema-instance"
    declares a constant identifier (not location!) for the reserved namespace xsi, needed to use the attribute xsi:schemaLocation

  3. xsi:schemaLocation = "http://www.example.org http://www.example.org/example.xsd"
    finally uses an attribute from namespace xsi to declare a concrete schema for the default namespace identifier http://www.example.org (first part in attribute value), and it references http://www.example.org/example.xsd (second part in attribute value, separated by space). Mind that there can be several namespace - location pairs in this attribute value!

So the schema for this XML is available on http://www.example.org/example.xsd. Loading this URI in a web browser should display the contents of the XML schema. All of the elements example, title, summary, content must be described there.

Validation

Following shows a way how to validate this XML using the programming language Java.

First we need a SAX parsing-handler that receives errors and warnings. Conveniently we also want to receive line numbers for the messages.

public class XmlValidationResult extends DefaultHandler
{
    public final List<String> warnings = new ArrayList<String>();
    public final List<String> errors = new ArrayList<String>();
    
    private Locator locator;
    
    /**
     * Called by the SAXParser before any other method.
     * @param locator the parser's locator object where you can get line numbers from.
     */
    @Override
    public void setDocumentLocator(Locator locator) {
        this.locator = locator;
    }
    
    @Override
    public void warning(SAXParseException ex) throws SAXException {
        warnings.add(lineNumber()+ex.getMessage());
    }

    @Override
    public void error(SAXParseException ex) throws SAXException {
        errors.add(lineNumber()+ex.getMessage());
    }

    @Override
    public void fatalError(SAXParseException ex) throws SAXException {
        errors.add(lineNumber()+ex.getMessage());
    }

    private String lineNumber() {
        return "Exception during validation"
                +((locator != null) ? " at line "+locator.getLineNumber() : "")
                +": ";
    }
}

Using this handler we now can check the XML for validity.

    public static XmlValidationResult validateXml(byte [] documentBytes) {
        final InputSource saxSource = new InputSource(new ByteArrayInputStream(documentBytes));
        
        final SAXParserFactory factory = SAXParserFactory.newInstance();
        factory.setNamespaceAware(true);
        factory.setValidating(true);

        final XmlValidationResult errorHandler = new XmlValidationResult();
        try {
            final SAXParser parser = factory.newSAXParser();
            parser.setProperty("http://java.sun.com/xml/jaxp/properties/schemaLanguage", XMLConstants.W3C_XML_SCHEMA_NS_URI); 
            parser.parse(saxSource, errorHandler);
        }
        catch (ParserConfigurationException | SAXException | IOException e) {
            errorHandler.errors.add("Unexpected parsing error: "+e.getMessage());
        }
        
        return errorHandler;
    }

For documentation about the used classes please read their JavaDoc. Unfortunately there isn't a String-constant for "http://java.sun.com/xml/jaxp/properties/schemaLanguage" anywhere, but it is one.

Externally Given Schema

Example

<?xml version="1.0" encoding="UTF-8"?>
<example>
  
  <title>....</title>
  <summary>....</summary>
  <content>....</content>

</example>

So here we have some XML that does not declare its schema, and we want to know if it conforms to http://www.example.org/example.xsd.

Validation

Following source would validate this XML in case the schema is passed as Source parameter.

    public static XmlValidationResult validateAgainstSchema(Source schemaSource, byte [] documentBytes) {
        final SchemaFactory schemaFactory = SchemaFactory.newInstance(XMLConstants.W3C_XML_SCHEMA_NS_URI);
        
        try {
            final Schema schema = schemaFactory.newSchema(schemaSource);
            final Validator validator = schema.newValidator();
            
            final XmlValidationResult errorHandler = new XmlValidationResult();
            validator.setErrorHandler(errorHandler);
            
            validator.validate(new StreamSource(new ByteArrayInputStream(documentBytes)));
            
            return errorHandler;
        }
        catch (Exception e) {
            throw new RuntimeException("Unexpected validation error: "+e.getMessage());
        }
    }

This implementation uses the javax.xml API introduced in Java 1.5.

Schema Located in CLASSPATH

The preferred way to drive validation surely is the one with internally given schema, because this gives the user the chance to alter the schema after deployment of the application. Else the application would have to maintain a compiled mapping of XML files to schemas.

A special problem with internally given validation is when you have schema files packed into an application.jar file. Imagine the case a user edits some XML, and the application has to validate that XML against one of these schemas. The user names the schema as relative or absolute path, instead through an http-URI.

Example

<?xml version="1.0" encoding="UTF-8"?>
<addresses
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:noNamespaceSchemaLocation='/absolute/path/in/jar/test.xsd'>

  <address>
    <name>Joe Tester</name>
    <street>Baker street 5</street>
  </address>
  
</addresses>

This is the simplest way to give XML a schema. The noNamespaceSchemaLocation attribute can contain just one schema location, no id - location pairs like schemaLocation.

Validation

The XML parser will not be able to locate this schema reference. You will get a message like

cvc-elt.1: Cannot find the declaration of element ....

But you can tell the validator how to load the schema via the org.w3c.dom.ls API (ls = Load and Save).

    public static XmlValidationResult validateAgainstSchemaInClasspath(byte [] documentBytes) {
        final SchemaFactory factory = SchemaFactory.newInstance(XMLConstants.W3C_XML_SCHEMA_NS_URI);

        try {
            final Schema schema = factory.newSchema();
            final Validator validator = schema.newValidator();
        
            final DOMImplementationRegistry registry = DOMImplementationRegistry.newInstance();
        
            validator.setResourceResolver(new LSResourceResolver() {
                @Override
                public LSInput resolveResource(String type, String namespaceURI, String publicId, String systemId, String baseURI) {
                    final InputStream in = getClass().getResourceAsStream(systemId);
                    final DOMImplementationLS domImplementationLS = (DOMImplementationLS) registry.getDOMImplementation("LS");
                    final LSInput input = domImplementationLS.createLSInput();
                    input.setByteStream(in);
                    return input;
                }
            });
        
            final XmlValidationResult errorHandler = new XmlValidationResult();
            validator.setErrorHandler(errorHandler);

            validator.validate(new StreamSource(new ByteArrayInputStream(documentBytes)));

            return errorHandler;
        }
        catch (Exception e) {
            throw new RuntimeException("Unexpected validation error: "+e.getMessage());
        }
    }

What can you do with such a validation?

  1. Either locate the schema files in an arbitrary path inside the JAR, and refer to them with an absolute path (starting with "/"),
  2. or put the schema files into the same path as the class that validates, and refer to the schemas with a path relative to the class (without leading "/").

For trying this out, here is the source of the XML schema used in this example.

<xs:schema xmlns:xs='http://www.w3.org/2001/XMLSchema'>

 <xs:element name="addresses">
  <xs:complexType>
   <xs:sequence>
    <xs:element ref="address" minOccurs='1' maxOccurs='unbounded' />
   </xs:sequence>
  </xs:complexType>
 </xs:element>

 <xs:element name="address">
  <xs:complexType>
   <xs:sequence>
    <xs:element ref="name" minOccurs='0' maxOccurs='1' />
    <xs:element ref="street" minOccurs='0' maxOccurs='1' />
   </xs:sequence>
  </xs:complexType>
 </xs:element>

 <xs:element name="name" type='xs:string' />
 <xs:element name="street" type='xs:string' />
 
</xs:schema>



Samstag, 20. Mai 2017

Good User Interfaces

Good user-interfaces are like good books: you understand, you can follow, they express common sense. Good documentation is not very much different, both should be well-structured and not self-repeating. Moreover a good user interface should be usable without having to read its documentation.

On the web you find a lot of quite different teachings about what makes up a good user interface. Not that I want to increase this confusion, I just want to summarize what of it I found useful, and mix it with my own experience. So (as the "10 things that ..." have become a web tradition:-) here come my ...

Seven (7) Things that Make Up Good User Interfaces

Simplicity: don't overburden users

Consistency: words and symbols

  • The same thing should not have different names or icons in different contexts
  • Words come from the app's business, not from computer slang

Transparency: hints, progress, success and errors

  • Tooltips show more information about what some button will do
  • When the app works, users can see what is going on, and how long it will last
  • Also success is reported, not only errors
  • Error messages tell why it happened, and how to correct it

State Keeping: it is easy to come back

  • When restarting, the app shows the face it had on exit
  • On a desktop, the app has the same size and location
  • Recently loaded items are available for fast review
  • Text fields suggest recently entered input

Navigation: can step safely

  • "Back" and "Forward" actions make sense also for non-browser applications
  • "Undo" and "Redo" actions let correct wrong inputs
  • The user is guided by the app that enables and disables actions and input fields
  • "Delete" and "Close" actions show a confirm-dialog
  • Confirm-dialogs have a "Don't show again" option for skilled users
  • When you tap somewhere and something opens, another tap to the same place closes it

Different work speeds: support for all kinds of users

  • Menus and toolbars for unskilled users
  • Context menus and keyboard shortcuts for the skilled

Layout: look regular

  • Components are arranged in grids and aligned to each other
  • There is enough space around labels, fields and buttons (wherever users want to tap)