Blog-Archiv

Sonntag, 29. Dezember 2019

JPA Entity Id Handling

Recently I presented a JPA BaseEntity class that implements the hashCode/equals contract once-and-only-once for all sub-classes. The JPA specification does not demand to override hashCode/equals in entity classes, but for consistent handling of entities in a big project it may be recommendable.

In context of distributed systems, entities being in hash-containers, and ids occurring in URLs, different questions rise around the nature and behaviour of an entity's id. The JPA specification chapter 2.4 states, for primary keys:

The value of its primary key uniquely identifies an entity instance within a persistence context and to EntityManager operations as described in Chapter 3, "Entity Operations". The application must not change the value of the primary key. The behavior is undefined if this occurs.

This Blog is a review of the BaseEntity implementation. Big background question is: "Which scope does the id have to cover?" - Unique per table? Per database? Globally unique?

Nature

Many web pages discuss the nature of ids. Search the web for "JPA primary key UUID versus number". Here comes my personal summary.

UUID Generator versus SEQUENCE Number

(1)
An UUID (Universally-Unique-Identifier) is 36 bytes long, consisting of 32 hex-digits and 4 dashes. Such UUIDs are globally unique in space and time, even if they were generated on different computers. The application, not the database, would generate them.

@MappedSuperclass
public abstract class BaseEntity
{
    @Id
    private UUID id = UUID.randomUUID();
    ....
}

(2)
Drawing Number ids from database sequences is a tradition, although surprisingly not all database products support sequences. Most old databases have numeric primary keys with running sequences, so setting up an object-relational model for an existing database may leave no other choice than using numeric keys.
Mind that there is no such thing like a numeric UUID, numbers can not be globally unique. So the scope of numeric ids is always table or database instance.

@MappedSuperclass
public abstract class BaseEntity
{
    @Id
    @GeneratedValue(strategy = GenerationType.SEQUENCE, generator = "DatabaseGlobalSequence")
    @SequenceGenerator(name = "DatabaseGlobalSequence", initialValue = 1, allocationSize = 10)
    private Long id;
    ....
}
→ If you want to (or must) use one sequence per table, you can not keep the id in BaseEntity class. In that case you must put it into all sub-classes, and write different sequence names into the related annotations. Anyway, using an public abstract getId() method you can keep the code for equals() and hashCode() in BaseEntity.

So, if you have the freedom to choose your primary key class ...

Pro UUID:

  • Unique in every database instance, so conversion to local id sequences on data-migration is not needed, and no queue-and-wait for an id-value by multi-threaded batch inserts

  • hashCode/equals durability can be provided by id-assignment on construction time (safe for putting into hash-containers, see example below)

  • Safer than number when used in an URL, attackers could not derive the next id from an UUID, like they could from a number

  • Saving hierarchies of entities is easier, because the child entity doesn't need to wait for the parent's id generation

Contra UUID:

  • An UUID is more expensive concerning storage (4 times bigger than Integer, 2 times bigger than Long), this problem escalates when big indexes need to store them, and lots of foreign keys exist

  • Generating an UUID is slower than drawing a new sequence or identity number, moreover the argument that Long numbers will run out of value is not true

  • You couldn't tell whether an entity was already persisted or not by testing the id for null, because the id gets generated at construction time, and thus is always present (persist() versus merge() problem), to find out you will need an additional database query

  • A sequence number as id gives the historical record order and can serve as fast default sort criterion (but: who in fact needs such a sort order?)

So What?

All web articles that are against UUID argue with performance- and storage-consumption reasons. They were dealing with big data and poor responsiveness. So a common advice may be:

  • SEQUENCE numbers for a production database with lots of data that serves all customers, and migrating data from production into a test instance is not an issue

  • UUIDs when selling many different databases to customers that do not have much data, but migrations can happen frequently

Don't expect silver bullets to work:-)

Behavior

Following is about the sense of durable of hashCode/equals implementations.

Unit Test

Here is a unit test that targets the behavior of the primary key, concerning the "undefined behavior" in case the id changes during lifetime of the entity.
For the complete implementation and how to turn this into a concrete test with a certain JPA provider please see my recent Blog.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
public abstract class JpaTest
{
    ....

    /** Setting the id changes hashCode() and equals(). */
    @Test
    public void shouldNotDisappearFromHashContainerWhenPersisting() {
        final BaseEntity person = newPerson("John Doe");
        
        final Set<Object> set = new HashSet<>();
        set.add(person);
        assertTrue(set.contains(person));
        
        transactional(em::persist, person);
        
        try {
            assertTrue(set.contains(person));    // will NOT work when id changed on persist()
        }
        finally {
            transactional(em::remove, person);
        }
    }

    ....
}

Look at line 11. The application puts a transient entity into a hash-container. Now look at line 17. This checks, after persist(), whether the persisted entity still is in the hash-container, using the contained() method. We know that, depending on how hashCode/equals was implemented, this assertion could break.

Sequence Number Primary Key

First let's try that unit test with following BaseEntity:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
import javax.persistence.GeneratedValue;
import javax.persistence.GenerationType;
import javax.persistence.Id;
import javax.persistence.MappedSuperclass;
import javax.persistence.SequenceGenerator;

@MappedSuperclass
public abstract class BaseEntity
{
    @Id
    @GeneratedValue(strategy = GenerationType.SEQUENCE, generator = "DatabaseGlobalSequence")
    @SequenceGenerator(name = "DatabaseGlobalSequence", initialValue = 1, allocationSize = 10)
    private Long id;

    /** @return the primary key of this entity. */
    public final Long getId() {
        return id;
    }
    
    /** Overridden to delegate to class-equality and id (when not null). */
    @Override
    public final boolean equals(Object o) {
        if (this == o)    // performance optimization
            return true;
        
        if (o == null || getClass() != o.getClass()) // exclude aliens
            return false;   // and one-to-one entities with same id
        
        final BaseEntity other = (BaseEntity) o;
        if (id == null || other.id == null) // can't use id
            return super.equals(o);
        
        return id.equals(other.id); // delegate equality to id
    }
    
    /** Overridden to delegate to id when not null, else to super. */
    @Override
    public final int hashCode() {
        return (id != null) ? id.hashCode() : super.hashCode();
    }
}

This uses a database-generated sequence called "DatabaseGlobalSequence". The @SequenceGenerator annotation is referenced by the preceding @GeneratedValue(generator=...) annotation.

We see that the test fails in line 17. Although the entity still is inside the set, the contains() method returned false. This is because the hashcode changed on-the-fly, from null to the sequence-value generated by the JPA provider on persist().

UUID Primary Key

Now let's try the same with an UUID in BaseEntity implementation:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
import java.util.UUID;
import javax.persistence.Id;
import javax.persistence.MappedSuperclass;

@MappedSuperclass
public abstract class BaseEntity
{
    @Id
    private UUID id = UUID.randomUUID();

    /** @return the primary key of this entity. */
    public final UUID getId() {
        return id;
    }
    
    /** Overridden to delegate to class-equality and id. */
    @Override
    public final boolean equals(Object o) {
        if (this == o)    // performance optimization
            return true;
        
        if (o == null || getClass() != o.getClass()) // exclude aliens
            return false;   // and one-to-one entities with same id
        
        final BaseEntity other = (BaseEntity) o;
        return id.equals(other.id); // delegate equality to id
    }
    
    /** Overridden to delegate to id. */
    @Override
    public final int hashCode() {
        return id.hashCode();
    }
}

We see that this implementation is shorter than the previous one. It doesn't have the @GeneratedValue annotation because the id value is set by the UUID.randomUUID() generator at the entity's construction time. Thus the hashCode/equals implementation is durable now, it relies on an id that is never null.

This time the test succeeds.

Conclusion

Databases are production assets. Every company tries to keep their persistence structures as long as possible unchanged, every modification is seen as critical. It absolutely matters how contents are identified and related. Best way may be to keep the type of the primary key flexible and isolated. Some even advice to use both SEQUENCE number and UUID together.




Donnerstag, 26. Dezember 2019

The JPA Add Problem with Backlinks

In my recent Blog I introduced a JPA test project that enables me to experiment with entity classes. In this Blog I would like to present a solution for the JPA add() problem. This builds on Java 1.8 and JPA 2.1. I used a H2 database for testing.

Demo Code

The add() problem exists just for hierarchical relations where you have a backlink to the collection-owner inside the child entity. Following example classes show just the necessary parts, I left out further properties for brevity.

Owner entity:

import java.util.HashSet;
import java.util.Set;
import javax.persistence.CascadeType;
import javax.persistence.Entity;
import javax.persistence.OneToMany;

@Entity
public class Team extends BaseEntity
{
    @OneToMany(mappedBy = "team", cascade = CascadeType.ALL, orphanRemoval = true)
    private Set<Responsibility> responsibilities = new HashSet<>();
    
    public Set<Responsibility> getResponsibilities() {
        return responsibilities;
    }
    public void add(Responsibility responsibility) {
        responsibilities.add(responsibility);
        responsibility.setTeam(this);
    }
}

You find the BaseEntity implementation in my recent Blog. The setResponsibilities() was left out to avoid application abuse. JPA providers don't care if it is missing, they use the private field.

Child entity (where the backlink is):

import javax.persistence.Entity;
import javax.persistence.ManyToOne;

@Entity
public class Responsibility extends BaseEntity
{
    @ManyToOne(optional = false)
    private Team team;
    
    public Team getTeam() {
        return team;
    }
    public void setTeam(Team team) {
        this.team = team;
    }
}

Hierarchical relations are cascading, that means deleting the team would also delete all contained responsibilities from the database. UML calls that "Composition".

API Weakness

The problem is the unsafe API.
Alice did this:

        ....
        team.add(responsibility);
        // right, add() sets the backlink correctly
        ....

But Bob did this:

        ....
        team.getResponsibilities().add(responsibility);
        // wrong, because the backlink is not set
        ....

What makes the developer use the Team.add() method? Nothing. It is a weak API. By the way, the add-method was coded by hand.
Can we do better?

Collection Wrapper Solution

The idea is to return a collection wrapper from getResponsibilities() that sets the backlink whenever an entity gets added, and clears it when an entity gets removed. There would be no hand-coded explicit add() method any more, instead wrapper classes implementing java.util.Set and java.util.List interfaces are needed. List is what comes in specified order and can contain duplicates, Set does not contain duplicates and has no order, both extend Collection.

Mind that such a solution is possible only when you have the JPA-annotations on the fields, not on the methods, so that the JPA container will not use getResponsibilities()!

The applying source-code would look like the following:

import java.util.*;
import javax.persistence.*;
import fri.jpa.util.BacklinkSettingCollection;

@Entity
public class Team extends BaseEntity
{
    @OneToMany(mappedBy = "team", cascade = CascadeType.ALL, orphanRemoval = true)
    private Set<Responsibility> responsibilities = new HashSet<>();

    ....
    
    public Collection<Responsibility> getResponsibilities() {
        return new BacklinkSettingCollection<Responsibility,Team>(
                responsibilities,
                this,
                (element, owner) -> element.setTeam(owner));
    }

    ....
}

There are no more hand-coded add() methods. The getter now returns a wrapper instead of the original collection maintained by JPA. Here the class BacklinkSettingCollection must cover java.util.Set. Its constructor requires the relation collection, the owner, and a function (Java 8 lambda) that sets the owner as backlink.

Following is the BacklinkSettingCollection implementation.

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
package fri.jpa.util;

import java.util.Collection;
import java.util.Iterator;
import java.util.function.BiConsumer;

public class BacklinkSettingCollection<ELEMENT,OWNER> implements Collection<ELEMENT>
{
    private final Collection<ELEMENT> relations;
    private final OWNER owner;
    private final BiConsumer<ELEMENT,OWNER> backLinkSetter;
    
    /**
     * @param relations the collection to maintain.
     * @param owner the holder object of the relations that
     *      must be set as owner to added elements.
     * @param backLinkSetter the function to use for setting the
     *      backlink, either to owner on add, or to null on remove,
     *      e.g. <code>(team,member) -> member.setTeam(team)</code>.
     */
    public BacklinkSettingCollection(
            Collection<ELEMENT> relations, 
            OWNER owner,
            BiConsumer<ELEMENT,OWNER> backLinkSetter)
    {
        assert relations != null && owner != null && backLinkSetter != null;
        
        this.relations = relations;
        this.owner = owner;
        this.backLinkSetter = backLinkSetter;
    }
    
    @Override
    public int size() {
        return relations.size();
    }

    @Override
    public boolean isEmpty() {
        return relations.isEmpty();
    }

    @Override
    public boolean contains(Object o) {
        return relations.contains(o);
    }

    @Override
    public Iterator<ELEMENT> iterator() {
        return new Iterator<ELEMENT>()
        {
            private Iterator<ELEMENT> delegate = relations.iterator();
            private ELEMENT current;
            
            @Override
            public boolean hasNext() {
                return delegate.hasNext();
            }
            @Override
            public ELEMENT next() {
                return current = delegate.next();
            }
            @Override
            public void remove() {
                delegate.remove();
                castAndSetNull(current);
            }
        };
    }

    @Override
    public Object[] toArray() {
        return relations.toArray();
    }

    @Override
    public <T> T[] toArray(T[] a) {
        return relations.toArray(a);
    }

    @Override
    public boolean add(ELEMENT e) {
        setOwner(e);
        return relations.add(e);
    }

    @Override
    public boolean remove(Object o) {
        castAndSetNull(o);
        return relations.remove(o);
    }

    @Override
    public boolean containsAll(Collection<?> c) {
        return relations.containsAll(c);
    }

    @Override
    public boolean addAll(Collection<? extends ELEMENT> c) {
        for (ELEMENT e : c)
            setOwner(e);
        return relations.addAll(c);
    }

    @Override
    public boolean removeAll(Collection<?> c) {
        for (Object o : c)
            castAndSetNull(o);
        return relations.removeAll(c);
    }

    @Override
    public boolean retainAll(Collection<?> c) {
        for (Object o : c)
            if (relations.contains(o) == false)
                castAndSetNull(o);
        return relations.retainAll(c);
    }
    
    @Override
    public void clear() {
        removeAll(relations);
    }
    

    protected final void setOwner(ELEMENT e) {
        backLinkSetter.accept(e, owner);
    }

    protected final void castAndSetNull(Object o) {
        @SuppressWarnings("unchecked")
        ELEMENT e = (ELEMENT) o;
        backLinkSetter.accept(e, null);
    }
}

Yes, the Java Collection interface has become big.

You can see that this is a fast wrapper, because most calls simply delegate to the original collection. And it is stateless, thus it can be constructed newly on every getResponsibilities() call. The only thing this class does is set and unset the backlink whenever a child gets added or removed.

The BiConsumer is a ready-made Java @FunctionalInterface that covers what I need here: a function that takes two parameters and returns nothing. Its accept() method would call the backlink setter.

Here is the according List implementation for ordered relation collections that can contain duplicates:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
package fri.jpa.util;

import java.util.Collection;
import java.util.List;
import java.util.ListIterator;
import java.util.function.BiConsumer;

public class BacklinkSettingList<ELEMENT,OWNER> 
    extends BacklinkSettingCollection<ELEMENT,OWNER> 
    implements List<ELEMENT>
{
    private final List<ELEMENT> relations;
    
    /** {@inheritDoc} */
    public BacklinkSettingList(
            List<ELEMENT> relations,
            OWNER owner,
            BiConsumer<ELEMENT, OWNER> backLinkSetter)
    {
        super(relations, owner, backLinkSetter);
        this.relations = relations;
    }
    
    @Override
    public boolean addAll(int index, Collection<? extends ELEMENT> c) {
        for (ELEMENT e : c)
            setOwner(e);
        return relations.addAll(index, c);
    }

    @Override
    public ELEMENT get(int index) {
        return relations.get(index);
    }

    @Override
    public ELEMENT set(int index, ELEMENT element) {
        castAndSetNull(get(index));
        setOwner(element);
        return relations.set(index, element);
    }

    @Override
    public void add(int index, ELEMENT element) {
        setOwner(element);
        relations.add(index, element);
    }

    @Override
    public ELEMENT remove(int index) {
        castAndSetNull(get(index));
        return relations.remove(index);
    }

    @Override
    public int indexOf(Object o) {
        return relations.indexOf(o);
    }

    @Override
    public int lastIndexOf(Object o) {
        return relations.lastIndexOf(o);
    }

    @Override
    public ListIterator<ELEMENT> listIterator() {
        return relations.listIterator();
    }

    @Override
    public ListIterator<ELEMENT> listIterator(int index) {
        return relations.listIterator(index);
    }

    @Override
    public List<ELEMENT> subList(int fromIndex, int toIndex) {
        return relations.subList(fromIndex, toIndex);
    }
}

Not so big any more, because it extends BacklinkSettingCollection. Mind that both classes have their own relations field, but as both fields are final it is impossible that they may work on different collections.

Unit Tests

The JpaTest that I introduced in my recent Blog still succeeded after I introduced these implementations. Which doesn't prove that they are safe, so here are tests that specialize on BacklinkSettingXXX and don't use a database.

Collection test:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
package fri.jpa.util;

import static org.junit.Assert.*;
import java.util.ArrayList;
import java.util.Collection;
import java.util.Iterator;
import org.junit.Test;

public class BacklinkSettingCollectionTest
{
    protected static class Member
    {
        private Team team;
        
        public void setTeam(Team team) {
            this.team = team;
        }
        public Team getTeam() {
            return team;
        }
    }
    
    protected static class Team
    {
        protected final Collection<Member> members = new ArrayList<>();
        
        public Collection<Member> getMembers()  {
            return new BacklinkSettingCollection<Member,Team>(
                    members,
                    this,
                    (m, t) -> m.setTeam(t));
        }
    }
    
    @Test
    public void shouldSetBackLinkWhenAddingAndRemoving() {
        final Team team = new Team();
        
        // test add()
        final Member member1 = new Member();
        team.getMembers().add(member1);
        final Member member2 = new Member();
        team.getMembers().add(member2);
        
        assertNotEquals(member1, member2);
        assertEquals(2, team.getMembers().size());
        
        final Iterator<Member> iterator = team.getMembers().iterator();
        assertTrue(member1 == iterator.next()); // is an ordered List
        assertTrue(member2 == iterator.next());
        // make sure the backlink has been set
        assertTrue(team == member1.getTeam());
        assertTrue(team == member2.getTeam());
        
        // test remove()
        final Member toRemove = member1;
        final Member toKeep = member2;
        team.getMembers().remove(toRemove);
        
        assertEquals(1, team.getMembers().size());
        assertFalse(team.getMembers().contains(toRemove));
        assertTrue(team.getMembers().contains(toKeep));
        assertTrue(null == toRemove.getTeam());
        assertTrue(team == toKeep.getTeam());
        
        // test clear() is removeAll()
        team.getMembers().clear();
        assertEquals(0, team.getMembers().size());
        assertTrue(null == toKeep.getTeam());
        
        // test removeIf()
        team.getMembers().add(member1);
        assertTrue(team == member1.getTeam());
        team.getMembers().removeIf(element -> true);
        assertTrue(null == member1.getTeam());
        
        // test iterator().remove()
        team.getMembers().add(member1);
        assertTrue(team == member1.getTeam());
        final Iterator<Member> iterator2 = team.getMembers().iterator();
        iterator2.next();
        iterator2.remove();
        assertTrue(null == member1.getTeam());
        assertEquals(0, team.getMembers().size());
    }
}

List test:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
package fri.jpa.util;

import static org.junit.Assert.*;
import java.util.Arrays;
import java.util.Collection;
import java.util.List;
import org.junit.Test;

public class BacklinkSettingListTest extends BacklinkSettingCollectionTest
{
    protected static class Team extends BacklinkSettingCollectionTest.Team
    {
        public Collection<Member> getMembers()  {
            return new BacklinkSettingList<Member,Team>(
                    (List<Member>) members,
                    this,
                    (m, t) -> m.setTeam(t));
        }
    }
    
    @Test
    public void shouldSetBackLinkWhenUsingListApi() {
        final Team team = new Team();
        final Member member0 = new Member();
        members(team).add(0, member0);
        final Member member1 = new Member();
        members(team).add(1, member1);
        assertEquals(2, team.getMembers().size());
        assertTrue(team == member0.getTeam());
        assertTrue(team == member1.getTeam());
        
        final Member member1Added = new Member();
        final Member member2Added = new Member();
        members(team).addAll(1, Arrays.asList(member1Added, member2Added));
        assertTrue(member1Added == members(team).get(1));
        assertTrue(member2Added == members(team).get(2));
        assertTrue(team == member1Added.getTeam());
        assertTrue(team == member2Added.getTeam());
        
        final Member member2Replacer = new Member();
        members(team).set(2, member2Replacer);
        assertTrue(null == member2Added.getTeam());
        assertTrue(team == member2Replacer.getTeam());
        
        members(team).remove(0);
        assertTrue(null == member0.getTeam());
    }
    
    private List<Member> members(Team team) {
        return (List<Member>) team.getMembers();
    }
}

Protecting the Backlink Setter

The other side of the add-method is the backlink setter method responsibility.setTeam(). Calling this without handling the collections the responsibility is currently in, and should go to, is also illegal. On first glance this looks not easy to solve, but what about making the setter package-visible, so that it can be called only from classes that are in same package as the entity?

@Entity
public class Responsibility extends BaseEntity
{
    @ManyToOne(optional = false)
    private Team team;
    
    public Team getTeam() {
        return team;
    }
    void setTeam(Team team) {
        this.team = team;
    }
}

Here, the setTeam() method is package-visible (no access-modifier), so it is accessible only for classes within the same Java-package. The restriction this solution demands is that all persistence-classes that relate to each other must be in same package.

Conclusion

I will use this solution in my further experiments with JPA entity classes. There may be other solutions for the add() problem, but currently I stick to this one. The JPA providers ignore the property-access methods when annotations are on the fields (see AccessType.FIELD), so why not implement entities in a bean-unlike way to make the API safer against developer mistakes?