Blog-Archiv

Sonntag, 28. November 2021

Refactoring the Code Duplication Example

This article refers to the code duplication example I presented in my preceding Blog.
I would propose two steps for getting rid of the code duplications in the example:

  1. Move the if-conditions away from AuthorExtractor into the caller by using the Java 8 functional interface Predicate

  2. Wrap the three different document-classes into just one class that provides a unified interface to them

Here are the methods in AuthorExtractor from which code duplications are to be removed:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
    public List<String> withNamepartFromDocumentsYoungerThan(
            String namePart,
            Date date,
            List<CsvDocument> csvDocuments,
            List<ExcelDocument> excelDocuments,
            List<PdfDocument> pdfDocuments)
    {
        final List<String> authors = new ArrayList<String>();
        
        for (CsvDocument csvDocument : csvDocuments)    {
            final String authorFullName = csvDocument.getAuthorFullName();
            final Date createdAt = csvDocument.getCreatedAt();
            // START code duplication 1
            if (authorFullName.contains(namePart) && createdAt.after(date))    {
                authors.add(authorFullName);
            }
            // END code duplication 1
        }
        
        for (ExcelDocument excelDocument : excelDocuments)    {
            final String author = excelDocument.getAuthor();
            final Date creationDate = excelDocument.getCreationDate();
            // START code duplication 1
            if (author.contains(namePart) && creationDate.after(date))    {
                authors.add(author);
            }
            // END code duplication 1
        }
        
        for (PdfDocument pdfDocument : pdfDocuments)    {
            final String authorFirstName = pdfDocument.authorFirstName();
            final String authorLastName = pdfDocument.authorLastName();
            final String authorFirstAndLastName = authorFirstName+" "+authorLastName;
            final Date createdAt = pdfDocument.createdAt();
            // START code duplication 1
            if (authorFirstAndLastName.contains(namePart) && createdAt.after(date))    {
                authors.add(authorFirstAndLastName);
            }
            // END code duplication 1
        }
        
        return authors;
    }
    
    // START code duplication 2
    public List<String> withNamestartFromDocumentsOlderEqual(
            String nameStart,
            Date date,
            List<CsvDocument> csvDocuments,
            List<ExcelDocument> excelDocuments,
            List<PdfDocument> pdfDocuments)
    {
        final List<String> authors = new ArrayList<String>();
        
        for (CsvDocument csvDocument : csvDocuments)    {
            final String authorFullName = csvDocument.getAuthorFullName();
            final Date createdAt = csvDocument.getCreatedAt();
            // START code duplication 3
            if (authorFullName.startsWith(nameStart) && (createdAt.equals(date) || createdAt.before(date)))    {
                authors.add(authorFullName);
            }
            // END code duplication 3
        }
        
        for (ExcelDocument excelDocument : excelDocuments)    {
            final String author = excelDocument.getAuthor();
            final Date creationDate = excelDocument.getCreationDate();
            // START code duplication 3
            if (author.startsWith(nameStart) && (creationDate.equals(date) || creationDate.before(date)))    {
                authors.add(author);
            }
            // END code duplication 3
        }
        
        for (PdfDocument pdfDocument : pdfDocuments)    {
            final String authorFirstName = pdfDocument.authorFirstName();
            final String authorLastName = pdfDocument.authorLastName();
            final String authorFirstAndLastName = authorFirstName+" "+authorLastName;
            final Date createdAt = pdfDocument.createdAt();
            // START code duplication 3
            if (authorFirstAndLastName.startsWith(nameStart) && (createdAt.equals(date) || createdAt.before(date)))    {
                authors.add(authorFirstAndLastName);
            }
            // END code duplication 3
        }
        
        return authors;
    }
    // END code duplication 2

Move Search Condition to Caller

When comparing the withNamepartFromDocumentsYoungerThan() method with withNamestartFromDocumentsOlderEqual() it is obvious that the if-conditions (line 14, 24, 36 - 59, 69, 81) are the only difference between them. When this can be externalized, we should end up with just one method.

This refactored method can't have the condition in its name any more, because the caller will determine the condition, so we call it findAll() now:

import java.util.function.Predicate;

public class AuthorExtractor
{
    /** Predicates passed to find() will receive objects of this class. */
    public static class ConditionParameter
    {
        public final String author;
        public final Date creationDate;
        
        public ConditionParameter(String author, Date creationDate)    {
            this.author = author;
            this.creationDate = creationDate;
        }
    }
    
    public List<String> findAll(
            Predicate<ConditionParameter> condition,
            List<CsvDocument> csvDocuments,
            List<ExcelDocument> excelDocuments,
            List<PdfDocument> pdfDocuments)    {

        .....
    }
}

Externalizing the condition means that we need to provide a parameter type to the Predicate class. This has been implemented by the immutable class ConditionParameter. Mind that the public members do not violate encapsulation as they are final.

Here is what the caller in CodeDuplication.main() must do now:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
public class CodeDuplication
{
    public static void main(String[] args) throws Exception {
        final List<CsvDocument> csvDocuments = new ArrayList<>();
        csvDocuments.add(new CsvDocument());
        final List<ExcelDocument> excelDocuments = new ArrayList<>();
        excelDocuments.add(new ExcelDocument());
        final List<PdfDocument> pdfDocuments = new ArrayList<>();
        pdfDocuments.add(new PdfDocument());
        
        final Date date1 = Date.from(Instant.now().minus(Period.of(0, 0, 2)));    // 2 days ago
        final List<String> authors1 = new AuthorExtractor().findAll(
                (p) -> p.author.contains("a") && p.creationDate.after(date1),
                csvDocuments,
                excelDocuments,
                pdfDocuments);
        System.out.println(authors1);
        
        final Date date2 = Date.from(Instant.now());
        final List<String> authors2 = new AuthorExtractor().findAll(
                (p) -> p.author.contains("A") && (p.creationDate.equals(date2) || p.creationDate.before(date2)),
                csvDocuments,
                excelDocuments,
                pdfDocuments);
        System.out.println(authors2);
    }
}

In line 13 and 21 are the lambdas that implement the Predicate conditions needed for the new findAll() method. And there is room for many more conditions, although restricted to dealing with author and creationDate.

Now that the condition in AuthorExtractor doesn't require code duplications any more, we can implement the new findAll() method:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
    public List<String> findAll(
            Predicate<ConditionParameter> condition,
            List<CsvDocument> csvDocuments,
            List<ExcelDocument> excelDocuments,
            List<PdfDocument> pdfDocuments)
    {
        final List<String> authors = new ArrayList<String>();
        
        for (CsvDocument csvDocument : csvDocuments)    {
            final String authorFullName = csvDocument.getAuthorFullName();
            final Date createdAt = csvDocument.getCreatedAt();
            if (condition.test(new ConditionParameter(authorFullName, createdAt)))    {
                authors.add(authorFullName);
            }
        }
        
        for (ExcelDocument excelDocument : excelDocuments)    {
            final String author = excelDocument.getAuthor();
            final Date creationDate = excelDocument.getCreationDate();
            if (condition.test(new ConditionParameter(author, creationDate)))    {
                authors.add(author);
            }
        }
        
        for (PdfDocument pdfDocument : pdfDocuments)    {
            final String authorFirstName = pdfDocument.authorFirstName();
            final String authorLastName = pdfDocument.authorLastName();
            final String authorFirstAndLastName = authorFirstName+" "+authorLastName;
            final Date createdAt = pdfDocument.createdAt();
            if (condition.test(new ConditionParameter(authorFirstAndLastName, createdAt)))    {
                authors.add(authorFirstAndLastName);
            }
        }
        
        return authors;
    }

Due to externalizing the search-condition we could remove the copied search-method withNamestartFromDocumentsOlderEqual(). Step one is done.

Still there are three loops, should be just one. So let's continue with step two.

Wrap Different Document Classes into just One

The idea is to write one wrapper class for the different document-types that provides all search criterions (like author and creation-date) generically. The class ConditionParameter already encapsulates these criterions, so we will reuse it in the wrapper.

In this step we will not touch the calls in CodeDuplication.main() any more. We can put all document wrappers privately into AuthorExtractor.

    private static class Document
    {
        public final ConditionParameter authorAndCreationDate;
        
        Document(CsvDocument csvDocument) {
            this.authorAndCreationDate = new ConditionParameter(
                    csvDocument.getAuthorFullName(),
                    csvDocument.getCreatedAt());
        }
        Document(PdfDocument pdfDocument) {
            this.authorAndCreationDate = new ConditionParameter(
                    pdfDocument.authorFirstName()+" "+pdfDocument.authorLastName(),
                    pdfDocument.createdAt());
        }
        Document(ExcelDocument excelDocument) {
            this.authorAndCreationDate = new ConditionParameter(
                    excelDocument.getAuthor(),
                    excelDocument.getCreationDate());
        }
    }

This helper class will wrap exactly one document. For each of the three document-types there is a dedicated constructor (so if you add another type, you will need another constructor here). After construction, all necessary information for the search will have been fetched from the original document to the wrapper.

One more wrapper encpsulates all three lists. Again, if another document-list gets added, also this class needs maintenance:

    private static class Documents implements Iterable<Document>
    {
        private final Iterator<CsvDocument> csvDocuments;
        private final Iterator<ExcelDocument> excelDocuments;
        private final Iterator<PdfDocument> pdfDocuments;
        
        Documents(
                List<CsvDocument> csvDocuments,
                List<ExcelDocument> excelDocuments,
                List<PdfDocument> pdfDocuments)    {
            this.csvDocuments = csvDocuments.iterator();
            this.excelDocuments = excelDocuments.iterator();
            this.pdfDocuments = pdfDocuments.iterator();
        }
        
        @Override
        public Iterator<Document> iterator() {
            return new Iterator<AuthorExtractor3.Document>()
            {
                @Override
                public boolean hasNext() {
                    return csvDocuments.hasNext() || 
                            excelDocuments.hasNext() || 
                            pdfDocuments.hasNext();
                }
                @Override
                public Document next() {
                    return csvDocuments.hasNext() ? new Document(csvDocuments.next())
                        : excelDocuments.hasNext() ? new Document(excelDocuments.next())
                            : pdfDocuments.hasNext() ? new Document(pdfDocuments.next())
                                : null;
                }
            };
        }
    }

This implements the Java interface Iterable to be used in a for-loop. The Iterable implementation is done in an anonymous class that delegates to the three lists. It returns a wrapper Document so that the caller doesn't need to care any more for the type of the document.

Now we can implement the findAll() method in a very simple way:

    public List<String> findAll(
            Predicate<ConditionParameter> condition,
            List<CsvDocument> csvDocuments,
            List<ExcelDocument> excelDocuments,
            List<PdfDocument> pdfDocuments)
    {
        final List<String> authors = new ArrayList<String>();
        
        for (Document document : new Documents(csvDocuments, excelDocuments, pdfDocuments))
            if (condition.test(document.authorAndCreationDate))
                authors.add(document.authorAndCreationDate.author);
        
        return authors;
    }

The loop now uses the Iterable interface of the generic class Documents. The condition inside the loop calls the given Predicate parameter, and when that condition succeeds, the document is added to the list.
That's what we need. Easy to read and understand, short and concise.

Here is the full refactored source code of AuthorExtractor:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
import java.util.function.Predicate;

public class AuthorExtractor
{
    /** Predicate lambdas passed to find() will receive objects of this class. */
    public static class ConditionParameter
    {
        public final String author;
        public final Date creationDate;
        
        public ConditionParameter(String author, Date creationDate)    {
            this.author = author;
            this.creationDate = creationDate;
        }
    }
    
    public List<String> findAll(
            Predicate<ConditionParameter> condition,
            List<CsvDocument> csvDocuments,
            List<ExcelDocument> excelDocuments,
            List<PdfDocument> pdfDocuments)
    {
        final List<String> authors = new ArrayList<String>();
        
        for (Document document : new Documents(csvDocuments, excelDocuments, pdfDocuments))
            if (condition.test(document.authorAndCreationDate))
                authors.add(document.authorAndCreationDate.author);
        
        return authors;
    }
    
    
    private static class Document
    {
        public final ConditionParameter authorAndCreationDate;
        
        Document(CsvDocument csvDocument) {
            this.authorAndCreationDate = new ConditionParameter(
                    csvDocument.getAuthorFullName(),
                    csvDocument.getCreatedAt());
        }
        Document(PdfDocument pdfDocument) {
            this.authorAndCreationDate = new ConditionParameter(
                    pdfDocument.authorFirstName()+" "+pdfDocument.authorLastName(),
                    pdfDocument.createdAt());
        }
        Document(ExcelDocument excelDocument) {
            this.authorAndCreationDate = new ConditionParameter(
                    excelDocument.getAuthor(),
                    excelDocument.getCreationDate());
        }
    }
    
    private static class Documents implements Iterable<Document>
    {
        private final Iterator<CsvDocument> csvDocuments;
        private final Iterator<ExcelDocument> excelDocuments;
        private final Iterator<PdfDocument> pdfDocuments;
        
        Documents(
                List<CsvDocument> csvDocuments,
                List<ExcelDocument> excelDocuments,
                List<PdfDocument> pdfDocuments)    {
            this.csvDocuments = csvDocuments.iterator();
            this.excelDocuments = excelDocuments.iterator();
            this.pdfDocuments = pdfDocuments.iterator();
        }
        
        @Override
        public Iterator<Document> iterator() {
            return new Iterator<AuthorExtractor.Document>()
            {
                @Override
                public boolean hasNext() {
                    return csvDocuments.hasNext() || 
                            excelDocuments.hasNext() || 
                            pdfDocuments.hasNext();
                }
                @Override
                public Document next() {
                    return csvDocuments.hasNext() ? new Document(csvDocuments.next())
                        : excelDocuments.hasNext() ? new Document(excelDocuments.next())
                            : pdfDocuments.hasNext() ? new Document(pdfDocuments.next())
                                : null;
                }
            };
        }
    }

}

Resume

Refactoring code duplications is not an easy task. Depending on the reason for the duplication you will need to find concepts on method- and class-level. Introducing interfaces will always be a safe way, although functional expressions may be much shorter. The effort for fixing code duplications can be saved when writing professional code right from start.




Code Duplication Example

A typical code duplication situation happens when you want to combine an algorithm (e.g. a search-loop) with objects of different types that do not have a common super-class. Assume you must implement a search in documents of type CSV, PDF and EXCEL:

final List<CsvDocument> csvDocuments = ....;
final List<PdfDocument> pdfDocuments = ....;
final List<ExcelDocument> excelDocuments = ....;

final List<String> authors = new AuthorExtractor().withNamepartFromDocumentsYoungerThan(
        "a",
        Date.from(Instant.now()),
        csvDocuments,
        excelDocuments,
        pdfDocuments);
        

This is a search for authors whose name contains "a" and the documents are younger than date/time of the search execution. The AuthorExtractor class will follow shortly.

The document-classes come from an external library, you can not change their source code. Lets assume all three classes deliver, in some way,

  • author name
  • creation date

Here are some some simple mock implementations, simulating document-types for the duplication example below:

ATTENTION: Simulation of an external class that you can not change!
public class CsvDocument
{
    public String getAuthorFullName()    {
        return "Anonymous";
    }
    
    public Date getCreatedAt()    {
        return Date.from(Instant.now().minus(Period.of(0, 0, 1)));    // 1 day ago;
    }
}

ATTENTION: Simulation of an external class that you can not change!
public class PdfDocument
{
    public String authorFirstName()    {
        return "John";
    }
    
    public String authorLastName()    {
        return "Warnock";
    }
    
    public Date createdAt()    {
        return Date.from(Instant.now().minus(Period.of(0, 0, 1)));    // 1 day ago;
    }
}

ATTENTION: Simulation of an external class that you can not change!
public class ExcelDocument
{
    public String getAuthor()    {
        return "Bill Gates";
    }
    
    public Date getCreationDate()    {
        return Date.from(Instant.now().minus(Period.of(0, 0, 1)));    // 1 day ago
    }
}

As you can see, the three document-classes expose different methods to get author and date, the PDF even provides two methods for the author's name while the others have it in just one. All of these example classes have mock author names and mock creation dates (one day old), we will use just one instance of each.

Here comes fast written code that implements the withNamepartFromDocumentsYoungerThan() search algorithm announced above:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
public class AuthorExtractor
{
    public List<String> withNamepartFromDocumentsYoungerThan(
            String namePart,
            Date date,
            List<CsvDocument> csvDocuments,
            List<ExcelDocument> excelDocuments,
            List<PdfDocument> pdfDocuments)
    {
        final List<String> authors = new ArrayList<String>();
        
        for (CsvDocument csvDocument : csvDocuments)    {
            final String authorFullName = csvDocument.getAuthorFullName();
            final Date createdAt = csvDocument.getCreatedAt();
            // START code duplication 1
            if (authorFullName.contains(namePart) && createdAt.after(date))    {
                authors.add(authorFullName);
            }
            // END code duplication 1
        }
        
        for (ExcelDocument excelDocument : excelDocuments)    {
            final String author = excelDocument.getAuthor();
            final Date creationDate = excelDocument.getCreationDate();
            // START code duplication 1
            if (author.contains(namePart) && creationDate.after(date))    {
                authors.add(author);
            }
            // END code duplication 1
        }
        
        for (PdfDocument pdfDocument : pdfDocuments)    {
            final String authorFirstName = pdfDocument.authorFirstName();
            final String authorLastName = pdfDocument.authorLastName();
            final String authorFirstAndLastName = authorFirstName+" "+authorLastName;
            final Date createdAt = pdfDocument.createdAt();
            // START code duplication 1
            if (authorFirstAndLastName.contains(namePart) && createdAt.after(date))    {
                authors.add(authorFirstAndLastName);
            }
            // END code duplication 1
        }
        
        return authors;
    }
}

I have marked the code duplications with comments. They are there because you need three loops for each of the document types, and inside each loop you have different calls to retrieve author/date, but then the same if-condition to filter out the wanted authors. Essentially even the three loops are code duplications, there should be just one.

Here is a main class that you can use for trying out this application:

public class CodeDuplication
{
    public static void main(String[] args) throws Exception {
        final List<CsvDocument> csvDocuments = new ArrayList<>();
        csvDocuments.add(new CsvDocument());
        final List<ExcelDocument> excelDocuments = new ArrayList<>();
        excelDocuments.add(new ExcelDocument());
        final List<PdfDocument> pdfDocuments = new ArrayList<>();
        pdfDocuments.add(new PdfDocument());
        
        final List<String> authors1 = new AuthorExtractor().withNamepartFromDocumentsYoungerThan(
                "a",
                Date.from(Instant.now().minus(Period.of(0, 0, 2))),	// 2 days ago,
                csvDocuments,
                excelDocuments,
                pdfDocuments);
        System.out.println(authors1);
        
        final List<String> authors2 = new AuthorExtractor().withNamestartFromDocumentsOlderEqual(
                "A",
                Date.from(Instant.now()),
                csvDocuments,
                excelDocuments,
                pdfDocuments);
        System.out.println(authors2);
    }
}

As code duplications are like viruses, it will get worse when the next query withNamestartFromDocumentsOlderEqual() or others get implemented, see the extended AuthorExtractor below, and mind that the first method was copy-pasted with slight modifications.

public class AuthorExtractor
{
    public List<String> withNamepartFromDocumentsYoungerThan(
            String namePart,
            Date date,
            List<CsvDocument> csvDocuments,
            List<ExcelDocument> excelDocuments,
            List<PdfDocument> pdfDocuments)
    {
        final List<String> authors = new ArrayList<String>();
        
        for (CsvDocument csvDocument : csvDocuments)    {
            final String authorFullName = csvDocument.getAuthorFullName();
            final Date createdAt = csvDocument.getCreatedAt();
            // START code duplication 1
            if (authorFullName.contains(namePart) && createdAt.after(date))    {
                authors.add(authorFullName);
            }
            // END code duplication 1
        }
        
        for (ExcelDocument excelDocument : excelDocuments)    {
            final String author = excelDocument.getAuthor();
            final Date creationDate = excelDocument.getCreationDate();
            // START code duplication 1
            if (author.contains(namePart) && creationDate.after(date))    {
                authors.add(author);
            }
            // END code duplication 1
        }
        
        for (PdfDocument pdfDocument : pdfDocuments)    {
            final String authorFirstName = pdfDocument.authorFirstName();
            final String authorLastName = pdfDocument.authorLastName();
            final String authorFirstAndLastName = authorFirstName+" "+authorLastName;
            final Date createdAt = pdfDocument.createdAt();
            // START code duplication 1
            if (authorFirstAndLastName.contains(namePart) && createdAt.after(date))    {
                authors.add(authorFirstAndLastName);
            }
            // END code duplication 1
        }
        
        return authors;
    }
    
    // START code duplication 2
    public List<String> withNamestartFromDocumentsOlderEqual(
            String nameStart,
            Date date,
            List<CsvDocument> csvDocuments,
            List<ExcelDocument> excelDocuments,
            List<PdfDocument> pdfDocuments)
    {
        final List<String> authors = new ArrayList<String>();
        
        for (CsvDocument csvDocument : csvDocuments)    {
            final String authorFullName = csvDocument.getAuthorFullName();
            final Date createdAt = csvDocument.getCreatedAt();
            // START code duplication 3
            if (authorFullName.startsWith(nameStart) && (createdAt.equals(date) || createdAt.before(date)))    {
                authors.add(authorFullName);
            }
            // END code duplication 3
        }
        
        for (ExcelDocument excelDocument : excelDocuments)    {
            final String author = excelDocument.getAuthor();
            final Date creationDate = excelDocument.getCreationDate();
            // START code duplication 3
            if (author.startsWith(nameStart) && (creationDate.equals(date) || creationDate.before(date)))    {
                authors.add(author);
            }
            // END code duplication 3
        }
        
        for (PdfDocument pdfDocument : pdfDocuments)    {
            final String authorFirstName = pdfDocument.authorFirstName();
            final String authorLastName = pdfDocument.authorLastName();
            final String authorFirstAndLastName = authorFirstName+" "+authorLastName;
            final Date createdAt = pdfDocument.createdAt();
            // START code duplication 3
            if (authorFirstAndLastName.startsWith(nameStart) && (createdAt.equals(date) || createdAt.before(date)))    {
                authors.add(authorFirstAndLastName);
            }
            // END code duplication 3
        }
        
        return authors;
    }
    // END code duplication 2
}

Duplicate code manifests itself in Long Methods. I call it "Having Déjà vu while reading the source".

Running the CodeDuplication.main() entry point should yield this output:

[Bill Gates, John Warnock]
[Anonymous]

Before refactoring the code duplications we should write a unit-test that asserts this result. The class CodeDuplication can be easily turned into such a test.

There are several steps and different ways how AuthorExtractor can be improved. In my next Blog I will show how I would do it.