Blog-Archiv

Samstag, 19. Dezember 2020

Java Parallel versus Sequential Stream Performance

In my article about for-loop versus Java 8 stream loop I presented a test showing that a for-loop could be faster than a stream when doing typical sequential work like searching the maximum in a collection of numbers. This was due to the fact that parallel threads may be fast in processing their partition of the collection, but in the end they have to figure out who found the maximum, and this synchronization made the for-loop win the race.

What about another kind of work being done inside the loop? This Blog is about performing a long lasting task for each element of a list, and how this performs when doing it by a for loop, a stream forEach(), or a parallel stream forEach(). Mind that such work must be free of side-effects when done in parallel.

Test Source

The example builds a list of high positive numbers, then loops it and calls a method that counts from zero up to the given number from the list. That method is called longLastingTask(). The loop is performed using three different techniques:

  1. for (long limit : list)
        longLastingTask(limit)
  2. list.stream().forEach(limit -> longLastingTask(limit))
  3. list.parallelStream().forEach(limit -> longLastingTask(limit))

Here is the source code of the performance test, explained step by step. On bottom you find the entire class. There are no external libraries included, just the plain Java runtime environment.

public class LongTaskInLoopPerformance
{
    public static void main(String[] args) {
        System.out.println("Java: "+System.getProperty("java.version"));
        System.out.println("Cores: "+Runtime.getRuntime().availableProcessors());
        
        LongTaskInLoopPerformance streamPerformance = new LongTaskInLoopPerformance(30000);
        streamPerformance.performTests(10);
    }
    
    
    private final Collection<Long> longList;
    
    public LongTaskInLoopPerformance(final int numberOfCountLimits) {
        longList = createTestList(numberOfCountLimits);
    }
    
    private Collection<Long> createTestList(int testDataLength) {
        final List<Long> longList = new ArrayList<Long>(testDataLength);
        for (int i = 0; i < testDataLength; i++)    {
            long random = (long) Math.abs(Math.random() * 100000d);
            if (random > 0)
                longList.add(Long.valueOf(random));  // positive big numbers
        }
        return longList;
    }
    
    // more source goes here ...

}

This is the class skeleton with main method and constructor. The main method outputs the Java version and the number of cores in the machine where the test runs. Then it constructs a test with the length of the count-limit list set to 30000. Finally it runs 10 tests, so that warm-up and other influences are minimized.

In constructor, LongTaskInLoopPerformance builds the list of long numbers containing the limits for the long-lasting count tasks. Such a task will count from zero to the limit it receives as parameter. The list of limits stays unchanged during all test runs.

Next is the sum method. Put this to "// more source goes here ..." in class above:

    public void performTests(int numberOfTests) {
        long sumForLoop = 0L, sumParallelStream = 0L, sumSequentialStream = 0L;
        int numberOfForLoops = 0, numberOfParallelStreams = 0, numberOfSequentialStreams = 0; 
        
        final int numberOfTestTypes = 3;    // for-loop, sequential stream, parallel stream
        numberOfTests *= numberOfTestTypes;
        
        for (int testNumber = 0; testNumber < numberOfTests; testNumber++)   {
            final boolean doForLoop = (testNumber % numberOfTestTypes == 1);    // every 2nd test is for-loop
            final boolean useParallelStream = (testNumber % numberOfTestTypes == 2);    // every 3rd test is parallel
            
            long millis = performTest(doForLoop, useParallelStream);
            
            if (doForLoop)  {
                sumForLoop += millis;
                numberOfForLoops++;
            }
            else if (useParallelStream) {
                sumParallelStream += millis;
                numberOfParallelStreams++;
            }
            else    {
                sumSequentialStream += millis;
                numberOfSequentialStreams++;
            }
        }
        
        System.out.println(numberOfParallelStreams+" parallel stream forEach-loops needed "+sumParallelStream+" millis");
        System.out.println(numberOfForLoops+" for-loops needed "+sumForLoop+" millis");
        System.out.println(numberOfSequentialStreams+" sequential stream forEach-loops needed "+sumSequentialStream+" millis");
    }

This method sets up counters for every type of loop and their time sums. For every test to execute, all three types of loop are run. Every 1st test will be a sequential stream, every 2nd a for-loop, every 3rd a parallel stream. Finally it outputs the time sums for all three loop types.

The next method runs exactly one test:

    private long performTest(final boolean doForLoop, final boolean useParallelStream)    {
        final long before = System.currentTimeMillis();
        
        if (doForLoop)
            for (Long l : longList)
                longLastingTask(l);
        else if (useParallelStream)
            performStreamTest(longList.parallelStream());
        else
            performStreamTest(longList.stream());
        
        return (System.currentTimeMillis() - before);
    }

    private void performStreamTest(Stream<Long> streamOfLong) {
        streamOfLong.forEach(l -> longLastingTask(l));
    }

    private void longLastingTask(long limit)    {
        for (long i = 0; i < limit; i++)
            ;
    }

Here the test loop is executed in three different ways, determined by the parameters to performTest().

The performStreamTest() method contains the forEach(). Both .stream() and .parallelStream() methods produce the same data type Stream<Long>, although its processing differs.

The longLastingTask() method finally counts from zero to the limit it receives, which is a random number between 1 and 100000.


Click here to see the entire test-class for copy & paste.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
import java.util.*;
import java.util.stream.Stream;

/**
 * Check whether parallel streams outperform sequential ones and for-loops
 * when performing a long-lasting task with each element of a list.
 */
public class LongTaskInLoopPerformance
{
    public static void main(String[] args) {
        System.out.println("Java: "+System.getProperty("java.version"));
        System.out.println("Cores: "+Runtime.getRuntime().availableProcessors());
        
        LongTaskInLoopPerformance streamPerformance = new LongTaskInLoopPerformance(30000);
        streamPerformance.performTests(10);
    }
    
    
    private final Collection<Long> longList;
    
    public LongTaskInLoopPerformance(final int numberOfCountLimits) {
        longList = createTestList(numberOfCountLimits);
    }
    
    private Collection<Long> createTestList(int testDataLength) {
        final List<Long> longList = new ArrayList<Long>(testDataLength);
        for (int i = 0; i < testDataLength; i++)    {
            long random = (long) Math.abs(Math.random() * 100000d);
            if (random > 0)
                longList.add(Long.valueOf(random));  // positive big numbers
        }
        return longList;
    }
    
    public void performTests(int numberOfTests) {
        long sumForLoop = 0L, sumParallelStream = 0L, sumSequentialStream = 0L;
        int numberOfForLoops = 0, numberOfParallelStreams = 0, numberOfSequentialStreams = 0; 
        
        final int numberOfTestTypes = 3;    // for-loop, sequential stream, parallel stream
        numberOfTests *= numberOfTestTypes;
        
        for (int testNumber = 0; testNumber < numberOfTests; testNumber++)   {
            final boolean doForLoop = (testNumber % numberOfTestTypes == 1);    // every 2nd test is for-loop
            final boolean useParallelStream = (testNumber % numberOfTestTypes == 2);    // every 3rd test is parallel
            
            long millis = performTest(doForLoop, useParallelStream);
            
            if (doForLoop)  {
                sumForLoop += millis;
                numberOfForLoops++;
            }
            else if (useParallelStream) {
                sumParallelStream += millis;
                numberOfParallelStreams++;
            }
            else    {
                sumSequentialStream += millis;
                numberOfSequentialStreams++;
            }
        }
        
        System.out.println(numberOfParallelStreams+" parallel stream forEach-loops needed "+sumParallelStream+" millis");
        System.out.println(numberOfForLoops+" for-loops needed "+sumForLoop+" millis");
        System.out.println(numberOfSequentialStreams+" sequential stream forEach-loops needed "+sumSequentialStream+" millis");
    }
    
    private long performTest(final boolean doForLoop, final boolean useParallelStream)    {
        final long before = System.currentTimeMillis();
        
        if (doForLoop)
            for (Long l : longList)
                longLastingTask(l);
        else if (useParallelStream)
            performStreamTest(longList.parallelStream());
        else
            performStreamTest(longList.stream());
        
        return (System.currentTimeMillis() - before);
    }

    private void performStreamTest(Stream<Long> streamOfLong) {
        streamOfLong.forEach(l -> longLastingTask(l));
    }

    private void longLastingTask(long limit)    {
        for (long i = 0; i < limit; i++)
            ;
    }

}

Results

Here is the output for Java 8:

Java: 1.8.0_121
Cores: 4
10 parallel stream forEach-loops needed 3510 millis
10 for-loops needed 6379 millis
10 sequential stream forEach-loops needed 6724 millis

And here for Java 11:

Java: 11.0.2
Cores: 4
10 parallel stream forEach-loops needed 3774 millis
10 for-loops needed 6312 millis
10 sequential stream forEach-loops needed 6393 millis

Conclusion

For long lasting tasks without side effects, done for each element of a list, turning the list into a parallel stream absolutely makes sense. The parallel stream finished in half the time of the others, the for-loop being a little faster than the sequential stream.




Keine Kommentare: