Blog-Archiv

Sonntag, 29. Dezember 2019

JPA Entity Id Handling

Recently I presented a JPA BaseEntity class that implements the hashCode/equals contract once-and-only-once for all sub-classes. The JPA specification does not demand to override hashCode/equals in entity classes, but for consistent handling of entities in a big project it may be recommendable.

In context of distributed systems, entities being in hash-containers, and ids occurring in URLs, different questions rise around the nature and behaviour of an entity's id. The JPA specification chapter 2.4 states, for primary keys:

The value of its primary key uniquely identifies an entity instance within a persistence context and to EntityManager operations as described in Chapter 3, "Entity Operations". The application must not change the value of the primary key. The behavior is undefined if this occurs.

This Blog is a review of the BaseEntity implementation. Big background question is: "Which scope does the id have to cover?" - Unique per table? Per database? Globally unique?

Nature

Many web pages discuss the nature of ids. Search the web for "JPA primary key UUID versus number". Here comes my personal summary.

UUID Generator versus SEQUENCE Number

(1)
An UUID (Universally-Unique-Identifier) is 36 bytes long, consisting of 32 hex-digits and 4 dashes. Such UUIDs are globally unique in space and time, even if they were generated on different computers. The application, not the database, would generate them.

@MappedSuperclass
public abstract class BaseEntity
{
    @Id
    private UUID id = UUID.randomUUID();
    ....
}

(2)
Drawing Number ids from database sequences is a tradition, although surprisingly not all database products support sequences. Most old databases have numeric primary keys with running sequences, so setting up an object-relational model for an existing database may leave no other choice than using numeric keys.
Mind that there is no such thing like a numeric UUID, numbers can not be globally unique. So the scope of numeric ids is always table or database instance.

@MappedSuperclass
public abstract class BaseEntity
{
    @Id
    @GeneratedValue(strategy = GenerationType.SEQUENCE, generator = "DatabaseGlobalSequence")
    @SequenceGenerator(name = "DatabaseGlobalSequence", initialValue = 1, allocationSize = 10)
    private Long id;
    ....
}
→ If you want to (or must) use one sequence per table, you can not keep the id in BaseEntity class. In that case you must put it into all sub-classes, and write different sequence names into the related annotations. Anyway, using an public abstract getId() method you can keep the code for equals() and hashCode() in BaseEntity.

So, if you have the freedom to choose your primary key class ...

Pro UUID:

  • Unique in every database instance, so conversion to local id sequences on data-migration is not needed, and no queue-and-wait for an id-value by multi-threaded batch inserts

  • hashCode/equals durability can be provided by id-assignment on construction time (safe for putting into hash-containers, see example below)

  • Safer than number when used in an URL, attackers could not derive the next id from an UUID, like they could from a number

  • Saving hierarchies of entities is easier, because the child entity doesn't need to wait for the parent's id generation

Contra UUID:

  • An UUID is more expensive concerning storage (4 times bigger than Integer, 2 times bigger than Long), this problem escalates when big indexes need to store them, and lots of foreign keys exist

  • Generating an UUID is slower than drawing a new sequence or identity number, moreover the argument that Long numbers will run out of value is not true

  • You couldn't tell whether an entity was already persisted or not by testing the id for null, because the id gets generated at construction time, and thus is always present (persist() versus merge() problem), to find out you will need an additional database query

  • A sequence number as id gives the historical record order and can serve as fast default sort criterion (but: who in fact needs such a sort order?)

So What?

All web articles that are against UUID argue with performance- and storage-consumption reasons. They were dealing with big data and poor responsiveness. So a common advice may be:

  • SEQUENCE numbers for a production database with lots of data that serves all customers, and migrating data from production into a test instance is not an issue

  • UUIDs when selling many different databases to customers that do not have much data, but migrations can happen frequently

Don't expect silver bullets to work:-)

Behavior

Following is about the sense of durable of hashCode/equals implementations.

Unit Test

Here is a unit test that targets the behavior of the primary key, concerning the "undefined behavior" in case the id changes during lifetime of the entity.
For the complete implementation and how to turn this into a concrete test with a certain JPA provider please see my recent Blog.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
public abstract class JpaTest
{
    ....

    /** Setting the id changes hashCode() and equals(). */
    @Test
    public void shouldNotDisappearFromHashContainerWhenPersisting() {
        final BaseEntity person = newPerson("John Doe");
        
        final Set<Object> set = new HashSet<>();
        set.add(person);
        assertTrue(set.contains(person));
        
        transactional(em::persist, person);
        
        try {
            assertTrue(set.contains(person));    // will NOT work when id changed on persist()
        }
        finally {
            transactional(em::remove, person);
        }
    }

    ....
}

Look at line 11. The application puts a transient entity into a hash-container. Now look at line 17. This checks, after persist(), whether the persisted entity still is in the hash-container, using the contained() method. We know that, depending on how hashCode/equals was implemented, this assertion could break.

Sequence Number Primary Key

First let's try that unit test with following BaseEntity:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
import javax.persistence.GeneratedValue;
import javax.persistence.GenerationType;
import javax.persistence.Id;
import javax.persistence.MappedSuperclass;
import javax.persistence.SequenceGenerator;

@MappedSuperclass
public abstract class BaseEntity
{
    @Id
    @GeneratedValue(strategy = GenerationType.SEQUENCE, generator = "DatabaseGlobalSequence")
    @SequenceGenerator(name = "DatabaseGlobalSequence", initialValue = 1, allocationSize = 10)
    private Long id;

    /** @return the primary key of this entity. */
    public final Long getId() {
        return id;
    }
    
    /** Overridden to delegate to class-equality and id (when not null). */
    @Override
    public final boolean equals(Object o) {
        if (this == o)    // performance optimization
            return true;
        
        if (o == null || getClass() != o.getClass()) // exclude aliens
            return false;   // and one-to-one entities with same id
        
        final BaseEntity other = (BaseEntity) o;
        if (id == null || other.id == null) // can't use id
            return super.equals(o);
        
        return id.equals(other.id); // delegate equality to id
    }
    
    /** Overridden to delegate to id when not null, else to super. */
    @Override
    public final int hashCode() {
        return (id != null) ? id.hashCode() : super.hashCode();
    }
}

This uses a database-generated sequence called "DatabaseGlobalSequence". The @SequenceGenerator annotation is referenced by the preceding @GeneratedValue(generator=...) annotation.

We see that the test fails in line 17. Although the entity still is inside the set, the contains() method returned false. This is because the hashcode changed on-the-fly, from null to the sequence-value generated by the JPA provider on persist().

UUID Primary Key

Now let's try the same with an UUID in BaseEntity implementation:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
import java.util.UUID;
import javax.persistence.Id;
import javax.persistence.MappedSuperclass;

@MappedSuperclass
public abstract class BaseEntity
{
    @Id
    private UUID id = UUID.randomUUID();

    /** @return the primary key of this entity. */
    public final UUID getId() {
        return id;
    }
    
    /** Overridden to delegate to class-equality and id. */
    @Override
    public final boolean equals(Object o) {
        if (this == o)    // performance optimization
            return true;
        
        if (o == null || getClass() != o.getClass()) // exclude aliens
            return false;   // and one-to-one entities with same id
        
        final BaseEntity other = (BaseEntity) o;
        return id.equals(other.id); // delegate equality to id
    }
    
    /** Overridden to delegate to id. */
    @Override
    public final int hashCode() {
        return id.hashCode();
    }
}

We see that this implementation is shorter than the previous one. It doesn't have the @GeneratedValue annotation because the id value is set by the UUID.randomUUID() generator at the entity's construction time. Thus the hashCode/equals implementation is durable now, it relies on an id that is never null.

This time the test succeeds.

Conclusion

Databases are production assets. Every company tries to keep their persistence structures as long as possible unchanged, every modification is seen as critical. It absolutely matters how contents are identified and related. Best way may be to keep the type of the primary key flexible and isolated. Some even advice to use both SEQUENCE number and UUID together.




Keine Kommentare: