Blog-Archiv

Sonntag, 29. Januar 2017

How to Read and Write XML

XML is a "data-language", being around for 20 years now. Its dedication was (and still is) to hold content, not layout.

The idea was to on-the-fly-convert XML to HTML using XSLT. To bring HTML nearer to XML, the XHTML standard was introduced, but has been given up in favour of HTML-5, which can not be read by XML-parsers any more. So this idea has been washed away by JavaScript and CSS frameworks. What remains is XML as content store, and communication format between browser and HTTP-server, alternative to JSON. It is also used heavily for application configurations.

Originally being a SGML dialect, it uses mainly these as control-characters:

< > / & " '

Read XML

Following is valid XML:

<hello-world />

Unlike in HTML, where the semantic of any element is well-defined, XML-elements can have any tag-name, so I called it hello-world. The trailing slash ('/') closes the element immediately, thus it has no content, it is just semantic (meaning). The space before the slash is optional.

Markup, Text, Elements, Tags ...

XML consists of markup and text.

<reminder>Write XML Crash Course</reminder>

The text "Write XML Crash Course" is the content, the element reminder is markup. You need an opening tag before the content, and a closing tag behind it. When I say element, I mean the element inclusive its content, when I say tag, I mean just the markup of the element.

XML forms a hierarchical structure, elements can contain sub-elements, and elements can contain attributes.

<reminders>
  <reminder>Write XML Crash Course</reminder>
  <reminder importan="true">Play with your children</reminder>
</reminders>

The important="true" is an attribute of the element reminder.

Mind that you can not have un-closed elements in XML, although they are allowed in HTML-5. Thus following XML would be invalid:

<reminders>
  <reminder>I forgot to close this element ...
  <br>
  <hr>
</reminders>

But XML allows text to be between elements:

<reminders>
  Text can be here ...
  <reminder>Write XML Crash Course</reminder>
  and also here, ....
  <reminder importan="true">Play with your children</reminder>
  and also here, like everwhere except before the root-element!
</reminders>

This is called "mixed content". It was mainly introduced to support HTML text attributions like <b>, <i> etc. (A reminder how difficult it is to separate content from layout!)

XML does not allow more than one element on first level (only one root-element), thus following XML is invalid:

<?xml version="1.0"?>
<reminders>
  <reminder>Write XML Crash Course</reminder>
</reminders>
<reminders>
  <reminder>Play with your children</reminder>
</reminders>

I used an XML-heading here to express that this is a complete XML document. The following explains the header.

Encoding

To form completely valid XML that should be accepted by any parser, we need this minimal heading:

<?xml version="1.0"?>
<reminders>
  ....

Mind that there must not be any leading space or newline between start of the file and this heading (except an optional byte order mark).

The XML-version is a hint for XML-parsers which rules to apply (currently there is no version 2.0 yet). Most XML parsers also accept XML without a heading. But then the encoding of this XML-text must be UTF-8.

The real value of the heading is that you can tell which encoding the file is (JSON does not have such):

<?xml version="1.0" encoding="ISO-8859-1?>

The encoding attribute tells the parser how to decode the bytes of the file to read. In UNICODE, one character could consist of several bytes. Different operating systems use different encodings (also called character-sets). For example LINUX uses UTF-8 by default (although you can install it to use another), and WINDOWS uses CP-1252 (conforming to the German ISO-885-1). There is no way to find out in which encoding a file has been written except the XML encoding attribute or a byte-order-mark (which is not very readable and popular).

Comments

Comments look like this:

<?xml version="1.0"?>
<!-- This is a comment
     It can be everywhere except inside element tags or attribute contents, or before XML heading.
     It must not contain "--", although most parsers compensate this.
-->
<hello-world/>

Entity References

Because XML uses control-characters like e.g. '<', there must be a way to write that as plain text when needed. For this purpose "entity references" were provided.

<reminder>
  How can I write a '<' when this would open an XML-tag?
  I can use the character entity reference &lt;
</reminder>

There are character-entities and internal entities (in a DTD). The latter serves for including external XML snippets (structured XML authoring).

Quotes

For attribute definitions, you can use single ' or double quote ", as you like.

<?xml version="1.0"?>
<hello-world world = "universe" planet='earth'/>

Attribute contents must be enclosed into quotes. An attribute without value is not possible. You can't close an open double by a single quote, but you can put single into double, or vice versa.

Namespaces

XML-elements can contain namespace-prefixes:

<my-namespace:reminders>
  <my-namespace:reminder>Write XML Crash Course</my-namespace:reminder>
  <my-namespace:reminder importan="true">Play with your children</my-namespace:reminder>
</my-namespace:reminders>

This is a way to reuse the same element-name, like e.g. <meeting>, bound to different contexts, like e.g. <time:meeting> and <location:meeting>, and then use them in the same XML document.

Arbitrary Content (Escaped)

There is a way to integrate data of any kind and structure into XML. This is called CDATA-section:

<content>
  <![CDATA[
Any markup here will not be interpreted by the XML parser, except the closing-token below.
Use it e.g. for embedded XML or HTML source code!
]]>
</content>

Write XML

To write XML, you can use any text editor. Watch out for your operating-system encoding, if it is not UTF-8 (default), you need either to write your encoding into the XML heading, or tell your editor to save the file in UTF-8. Some text editors may manage this automatically for you.

How can you know which elements and attributes you are allowed to use?

When you use no predefined document type, you can use any tag-name and attribute-name you can think of. Just make sure that your document is well-formed, all tags and quotes must be closed.

In case you use a document-type, there are two options:

  1. DTD (Document-Type-Definition), or
  2. XML-schema

A DTD can describe a document not as precise as an XML-schema can, but it is much better readable.

Your Own Document Design

So, where to put the content, into elements or into attributes? A frequently discussed question. My advice is to put into attributes just things that define the element-semantic nearer (meta-informations). Mind also that the set of characters that you can use in attribute content is smaller than that of elements.

Predefined Document Design

When your XML editor does not support one of the type-definitions mentioned above, you are forced to understand that document-type-definition and manually write XML that conforms to it. This can be quite demanding when the type is big and complex, so check the web for free XML editors that support your type-definition!

And ...?

XML authors can convert their work to HTML, PDF, Word, or any other document format when they separated content from layout. Chapter structures and text attributions would not count as layout.

DocBook is the way they are mostly going (or SimpleDocBook). There are also frameworks like DITA, suitable for help-authors.

There are free XML-to-PDF converters on the web. Manually it can be done via Apache FOP, but you will need programming knowledges. XML Mind is a quite friendly XML editor.




Freitag, 27. Januar 2017

A Proxy-Based DTO in Java

A proxy is something that stands for something else. It is the deputy for another object which is not present because of different reasons. Maybe it is loading lazily, or it is not implemented at all. In the Java runtime library and virtual machine we have built-in support for proxying Java interfaces (not classes).

This Blog presents a generic data-transfer-object (DTO) implementation, based on that Java Proxy utility. Read about the resulting overhead for maintaining DTOs. You will have a parallel hierarchy of interfaces beside your entity hierarchy. There are a lot of voices that doubt the advantage of using DTOs as layer between persistence and presentation.

Base Interface

We assume that a data-transfer-object is a bean that consists just of getters and setters, all public. To be able to implement such a DTO generically, we need interfaces for all objects to transfer. In other words, you must have an interface for any POJO (entity, domain-object, however you call your database-record) that you want to transfer over the network.

Here is a base interface all these interfaces to be proxied would have to derive (at least to be marked as DTO):

import java.io.Serializable;

/**
 * The base interface for any proxied DTO that provides
 * reading and writing of properties. Supports a "dirty" flag.
 */
public interface TransferableData extends Serializable
{
    /** @return true when this object has been changed by calling some setter. */
    boolean dirty();
}

Example Application

Before going into details, here is an example application of what follows:

import java.lang.reflect.Proxy;
import java.util.Date;

public class Demo
{
    public interface DemoEntity extends TransferableData
    {
        int getNumber();
        void setNumber(int number);
        
        String getName();
        void setName(String name);
        
        Date getDate();
        void setDate(Date date);
        
        boolean getBoolean();
        void setBoolean(boolean flag);
    }
    
    public static void main(String[] args) {
        final DemoEntity dto = (DemoEntity) Proxy.newProxyInstance(
                DemoEntity.class.getClassLoader(), 
                new Class<?> [] { DemoEntity.class },
                new GenericDtoInvocationHandler(DemoEntity.class));

        assert dto.dirty() == false : "DTO is dirty after construction!";
    }
}

The DemoEntity interface outlines the functionality the example-DTO should provide. There are properties with primitive (int) and complex (Date) data-type. All have getter and setter, no read-only properties (with just a getter) are generically possible here. The interface derives TransferableData and thus already has the dirty() flag.

The main() method builds a real DTO for that interface by calling the static Java Proxy.newProxyInstance() method. You must pass a (1) class-loader, (2) an array of interfaces to implement, and (3) the invocation-handler that will receive all method calls on that DTO, except wait() and notify(), done by the Java virtual machine. That last GenericDtoInvocationHandler parameter will be introduced in the following.

We can call dto.dirty() to make sure that it is not already dirty after construction.

Generic DTO Invocation Handler

The invocation-handler will be called for any property the interface describes. So it must be able to store any bean-property generically. For that purpose it holds a property-value map. Further it holds a property-type map, so that it can generate defaults for primitive properties in case their value is null (never has been set). These defaults should conform with Java defaults like 0 (zero) for numbers, and false for booleans.

Here is the outline of the invocation-handler with its constructor:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
import java.io.Serializable;
import java.util.*;
import java.lang.reflect.*;

/**
 * The invocation-handler for a generic data-transfer-object
 * supporting getters and setters, and a dirty() flag.
 */
public class GenericDtoInvocationHandler implements InvocationHandler, Serializable
{
    private final Class<? extends TransferableData> implementedInterface;
    
    private final Map<String,Object> fieldValueMap = new HashMap<>();
    private final Map<String,Class<?>> fieldTypeMap = new Hashtable<>();
    private final Set<String> predefinedMethods = new HashSet<>();
 
    private boolean dirty;
 
    /**
     * Construct an InvocationHandler for given interface extending TransferableData.
     * @param implementedInterface must not be null, the interface that the proxy receiving
     *      this invocation-handler will implement.
     */
    public GenericDtoInvocationHandler(Class<? extends TransferableData> implementedInterface) {
        assert implementedInterface != null;
     
        this.implementedInterface = implementedInterface;
        
        // handle internal methods defined in java.lang.Object
        predefinedMethods.add("equals");
        predefinedMethods.add("hashCode");
        predefinedMethods.add("toString");
        // getClass() and thread-methods like wait() and notify() are handled by JVM
        
        // provide a custom method to find out if a setter has been called
        predefinedMethods.add("dirty");
        
        for (Method interfaceMethod : implementedInterface.getMethods())  {
            if (isGetter(interfaceMethod))  {
                final Class<?> interfaceGetterReturnType = interfaceMethod.getReturnType();
                final String fieldName = makeFieldName(interfaceMethod);
                fieldTypeMap.put(fieldName, interfaceGetterReturnType);
            }
        }
    }

    ....

}

The fields of this class represent the "object's state". This class holds fields that are evaluated in the constructor, and are not touched after any more. Thus any object of this class should have a "stable state".

In the constructor, the implemented interface is remembered to make sure that any invoke() call delivers the same interface. Then the names of some predefined methods are stored. Finally the interface-methods are iterated, and the data-type of each getter is stored into a map.

Here comes the invoke() responsibility, implementing InvocationHandler:

    /**
     * Implements InvocationHandler.
     * @param proxy the JVM-generated object this invocation-handler acts for.
     * @param method the interface-method currently called on proxy.
     * @param arguments the parameters of the interface-method call.
     * @return the return of called method, as expected in the interface implemented by proxy.
     */
    @Override
    public Object invoke(Object proxy, Method method, Object[] arguments) throws Exception {
        assert proxy.getClass().getInterfaces()[0].equals(implementedInterface) :
            "The proxy's interface "+proxy.getClass().getInterfaces()[0]+" does not match the handler's one: "+implementedInterface;

        final String methodName = method.getName();
        if (predefinedMethods.contains(methodName))
            return invokePredefinedMethod(proxy, methodName, arguments);
  
        final String fieldName = makeFieldName(method);
        
        if (isGetter(method)) {
            final Object value = fieldValueMap.get(fieldName);
            final Class<?> type = fieldTypeMap.get(fieldName);
            assert type != null : "Unknown getter: "+method.getName();

            if (value == null && type.isPrimitive())
                return convertPrimitiveNullValue(type);
   
            return value;
        }
        else {    // must be setter
            putFieldValue(fieldName, arguments[0]);
            return null;
        }
    }

First we assert that the received proxy conforms to the interface we support. Then we look at the Method we have to respond to. When it is predefined, we answer it specially (see invokePredefinedMethod() below). When not, we build a property-name from the method, e.g. "Number" from "setNumber". Then we distinguish between getter ad setter.

In case a getter is called, we return the value from the map. In case that is null, and the data-type of the property is primitive, we have to provide a Java-compliant default-value for it (see convertPrimitiveNullValue() below).

In case a setter is called, we put the value into the map.

Here is the remaining private part of GenericDtoInvocationHandler:

    private Object invokePredefinedMethod(Object proxy, String methodName, Object [] arguments) throws Exception  {
        if (methodName.equals("equals"))
            return Objects.equals(proxy, arguments[0]);

        if (methodName.equals("hashCode"))
            return proxy.hashCode();

        if (methodName.equals("toString"))
            return toString();

        if (methodName.equals("dirty"))
            return dirty;
        
        throw new IllegalStateException("Not an internal method: "+methodName);
    }
    
    private final String makeFieldName(Method method) {
        final String methodName = method.getName();
        
        if (methodName.startsWith("is"))
            return method.getName().substring("is".length());
        
        if (methodName.startsWith("get"))
            return method.getName().substring("get".length());
        
        if (methodName.startsWith("set"))
            return method.getName().substring("set".length());
        
        throw new IllegalArgumentException("Invoked method is neither getter nor setter: "+methodName);
    }

    private boolean isGetter(Method method) {
        return
            method.getReturnType().equals(void.class) == false &&
            (method.getName().startsWith("get") || method.getName().startsWith("is"));
    }
    
    private Object convertPrimitiveNullValue(final Class<?> clazz) {
        if (clazz.equals(boolean.class))
            return Boolean.FALSE;
        if (clazz.equals(byte.class))
            return Byte.valueOf((byte) 0);
        if (clazz.equals(char.class))
            return Character.valueOf((char) 0);
        if (clazz.equals(int.class))
            return Integer.valueOf(0);
        if (clazz.equals(short.class))
            return Short.valueOf((short) 0);
        if (clazz.equals(long.class))
            return Long.valueOf((long) 0);
        if (clazz.equals(float.class))
            return Float.valueOf((float) 0);
        if (clazz.equals(double.class))
            return Double.valueOf((double) 0);
                    
        throw new IllegalArgumentException("Unknown primitive type: "+clazz);
    }
    
    private void putFieldValue(String fieldName, Object value)    {
        assert fieldName != null;
        
        final Object oldValue = fieldValueMap.get(fieldName);
        if (Objects.equals(oldValue, value) == false)
            dirty = true;
        
        fieldValueMap.put(fieldName, value);
    }

I think this is neither hard to read nor hard to understand. We could add a toString() by returning the fieldValueMap.toString(). Important is to respond to any illegal situation by throwing an exception. That way you will discover bugs early.

Mind that we set the "dirty" state in putValue(). Thanks to the new Objects.equals(), calling the setter with the same value as it had before will not make the object dirty.

Test Code

Finally we need some code to test this thing. Here it is:

    public static void main(String[] args) {
        // test data definition
        
        final int number = 3;
        final String name = "Hello";
        final Date now = new Date();
        
        // test execution
        
        final DemoEntity dto = (DemoEntity) Proxy.newProxyInstance(
                DemoEntity.class.getClassLoader(), 
                new Class<?> [] { DemoEntity.class },
                new GenericDtoInvocationHandler(DemoEntity.class));
        
        assert dto.dirty() == false : "DTO is dirty after construction!";
        
        dto.setNumber(number);
        dto.setName(name);
        dto.setDate(now);
        // leave boolean on default
        
        // test data assertions
        
        assert dto.dirty() == true : "DTO is not dirty!";
        assert dto.getNumber() == number : "Number is "+dto.getNumber();
        assert dto.getName().equals(name) : "Name is "+dto.getName();
        assert dto.getDate().equals(now) : "Date is "+dto.getDate();
        assert dto.getBoolean() == false : "Boolean is "+dto.getBoolean();
        
        System.out.println("Test succeeded!");
    }

Like in every test there is

  1. a test-data definition,
  2. a test execution,
  3. a test-data assertion

Put all this code into Java files, compile it, run it (don't forget to enable asserts by -ea), and you will see:

Test succeeded!



Montag, 23. Januar 2017

Apply the final Keyword in Java

Why do we need constants? Why not store everything in variables, like it was done in JavaScript before ES-6 introduced the const keyword? We should stay realistic, in reality nothing is constant. Everything changes, this is the way things are going.

But to find our way in this life, we need to make decisions. We decide based on facts that we know. It is not possible to make decisions upon facts that are not certain, or may change every minute. So we can safely decide only when the facts are constant.


In Java you can declare a constant using the final keyword. You should use it as much as possible, because it makes source code easier to understand, and more robust at runtime. Remember that functional programming languages (e.g. XSLT) do not have variables, they provide constants only. And software written in functional languages is the most resilient in this world. So use final constants as much as you can!

This Blog shows how to use final even in cases where you would think a variable is inevitable.


Inevitable Variable?

Consider following code (your daughter's Saturday afternoon:-)

  boolean canGo = true;

  if (isRaining() == true)
    if (isUmbrellaAvailable() == false)
      canGo = false;

  if (canGo)
    go();
  else
    phone();

If we would change canGo to final, we would get a compile error. Because in case the umbrella is not available, the constant would be re-assigned, and the compiler does not allow that for constants.

But as we insist on using final, we rewrite the code.

Constant Replacement!


  final boolean canGo;

  if (isRaining() == true)
    if (isUmbrellaAvailable() == false)
      canGo = false;
    else
      canGo = true;
  else
    canGo = true;

  if (canGo)
    go();
  else
    phone();

There are voices that deny this solution. Their argument is: the code has become significantly longer, and thus is harder to read.

So here is the short variant:

  final boolean canGo = (isRaining() == false || isUmbrellaAvailable() == true);

  if (canGo)
    go();
  else
    phone();

I admit, this is not always possible so simply. Consider we need several method calls to find out if we can go when it is raining. Then we need the long variant, no way around. But there are always tricks to make it shorter.

  final boolean canGo;

  if (isRaining() == true) {
    final UmbrellaKeeper umbrellaKeeper = GlobalSingletons.get(UmbrellaKeeper.class);
    umbrellaKeeper.promiseMuffin();
    canGo = umbrellaKeeper.isUmbrellaAvailable();
  }
  else {
    canGo = true;
  }

  if (canGo)
    go();
  else
    phone();

Don't Reassign Variables

Here is why I prefer having constants instead of variables:

When a variable is re-assigned, most likely also its semantic has changed.
Then the name of the variable does not fit any more, because it should express its semantic.

So here is my last offer, maybe this can convince you:

  final boolean canGo = true;
  final boolean canGoWeatherChecked = (canGo && isRaining() == false);
  final boolean canGoGearChecked = (canGoWeatherChecked || isUmbrellaAvailable());

  if (canGoGearChecked)
    go();
  else
    phone();

I would call this "semantic solution".

Of course final is not always possible. Consider count-variables like the famous

for (int i = 0; i < array.length; i++)

Implementing such loops using recursive calls is one of the biggest obstacles in functional languages. Here you are glad to have variables. But such numeric variables are (nearly:-) the only cases where you can't use the final modifier.