Jez Higgins

Freelance software generalist
software created
extended or repaired


Older posts are available in the archive or through tags.

Feed

Follow me on Twitter
My code on GitHub

Contact
About

Friday 06 December 2013 The Forest Road Reader, No 115

Last week, I spent some time banging on about iterators and ranges and what-not, before stumbling over their Java 8 incarnation as "streams". A mildly sarcastic commenter suggests that this isn't anything new, as it's been in Groovy for years. This is true, but also false.

Groovy (once the shiny new future of development in Java and now perhaps suffering a mild existential crisis) has a cunning (and/or dangerous, depending on your outlook) extension (and/or hacking, depending on your outlook) mechanism for grafting new methods into existing classes. The prime use for these extensions is the Groovy JDK, which adds any number of methods into the JDK classes to make them, well, make them more groovy.

Among the methods added to Iterator are various sort, takes, and sums, but nothing equivalent to where/select (aka filter/map)[1]. This is really what sets Java 8 apart from Groovy - after all you could always write your own extensions to add those methods in yourself. The thing that sets Java 8 Streams apart is the drop dead simple support for parallelism.

Java has had support for parallelism since it was introduced, and it has gradually improved to the point where it's actually quite usable. Streams, to my untutored eye, look like a leap forward and really quite remarkably easy. My friend Russel, who knows far more about high performance numerical computing than I ever will, often uses quadrature calculation of π as an example when demonstrating scaling and parallelism. Here's a simple serial version to start us off:

public class Serial {
  private static void pi() {
    final long startTimeNanos = System.nanoTime();

    final int n = 1000000000;
    final double delta = 1.0 / n;
    final double pi = 4.0 * delta * multiplier(n, delta);

    final double elapseTime = (System.nanoTime() - startTimeNanos) / 1e9;
    System.out.println(String.format("%f %d %f", pi, n, elapseTime));
  }

  static private double multiplier(int count, double delta) {
    final int start = 1;
    final int end = count;

    double sum = 0.0;

    for (int i = start; i <= end; ++i) {
      final double x = (i - 0.5) * delta;
      sum += 1.0 / (1.0 + x * x);
    }

    return sum;
  }

  public static void main(final String[] args) {
    pi();
  }
}
On my machine this outputs
3.141593 1000000000 12.560689
That's a fair approximation of π, the number of iterations used in the quadrature calculation, and the run time in seconds.

Quadrature calculation of πis a problem Russel describes as "embarrassingly parallel". What he means by that it that you can divide the calculation into as many slices as you like, evaluate the slices in any order, gather them up, and the result is the same. Let's chunk it up.

public class SerialChunk {
  private static void pi(int chunks) {
    final long startTimeNanos = System.nanoTime();

    final int n = 1000000000;
    final int chunkSize = n / chunks;
    final double delta = 1.0 / n;

    double multiplier = 0.0;
    for (int i = 0; i < chunks; ++i)
      multiplier += multiplierChunk(i, chunkSize, delta);

    final double pi = 4.0 * delta * multiplier;

    final double elapseTime = (System.nanoTime() - startTimeNanos) / 1e9;
    System.out.println(String.format("%f %d %f %d", pi, n, elapseTime, chunks));
  }

  static private double multiplierChunk(int chunk, int chunkSize, double delta) {
    final int start = 1 + (chunk * chunkSize);
    final int end = (chunk + 1) * chunkSize;

    double sum = 0.0;

    for (int i = start; i <= end; ++i) {
      final double x = (i - 0.5) * delta;
      sum += 1.0 / (1.0 + x * x);
    }

    return sum;
  }

  public static void main(final String[] args) {
    pi(1);
    pi(2);
    pi(8);
    pi(16);
  }
}
My machine gives
3.141593 1000000000 12.379540 1
3.141593 1000000000 11.487891 2
3.141593 1000000000 9.969373 8
3.141593 1000000000 10.028523 16
Same result, more or less the same run time. That for loop is a bit last century, given my current obsession with ranges. Let's get head into the future.
import java.util.stream.IntStream;

public class SerialStream {
  private static void pi(int chunks) {
    final long startTimeNanos = System.nanoTime();

    final int n = 1000000000;
    final int chunkSize = n / chunks;
    final double delta = 1.0 / n;

    double multiplier = IntStream.range(0, chunks).
                  mapToDouble(chunk -> multiplierChunk(chunk, chunkSize, delta)).
                  sum();

    final double pi = 4.0 * delta * multiplier;

    final double elapseTime = (System.nanoTime() - startTimeNanos) / 1e9;
    System.out.println(String.format("%f %d %f %d", pi, n, elapseTime, chunks));
  }

  static private double multiplierChunk(int chunk, int chunkSize, double delta) {
    final int start = 1 + (chunk * chunkSize);
    final int end = (chunk + 1) * chunkSize;

    double sum = 0.0;

    for (int i = start; i <= end; ++i) {
      final double x = (i - 0.5) * delta;
      sum += 1.0 / (1.0 + x * x);
    }

    return sum;
  }

  public static void main(final String[] args) {
    pi(1);
    pi(2);
    pi(8);
    pi(16);
  }
}
Swapped the loop, for an integer range and a nice functionally map and sum. Same results again -
3.141593 1000000000 11.982472 1
3.141593 1000000000 11.369035 2
3.141593 1000000000 10.302495 8
3.141593 1000000000 10.302260 16

Now let's go parallel. Look carefully or you'll miss it -

import java.util.stream.IntStream;

public class ParallelStream {
  private static void pi(int chunks) {
    final long startTimeNanos = System.nanoTime();

    final int n = 1000000000;
    final int chunkSize = n / chunks;
    final double delta = 1.0 / n;

    double multiplier = IntStream.range(0, chunks).
                  parallel().
                  mapToDouble(chunk -> multiplierChunk(chunk, chunkSize, delta)).
                  sum();

    final double pi = 4.0 * delta * multiplier;

    final double elapseTime = (System.nanoTime() - startTimeNanos) / 1e9;
    System.out.println(String.format("%f %d %f %d", pi, n, elapseTime, chunks));
  }

  static private double multiplierChunk(int chunk, int chunkSize, double delta) {
    final int start = 1 + (chunk * chunkSize);
    final int end = (chunk + 1) * chunkSize;

    double sum = 0.0;

    for (int i = start; i <= end; ++i) {
      final double x = (i - 0.5) * delta;
      sum += 1.0 / (1.0 + x * x);
    }

    return sum;
  }

  public static void main(final String[] args) {
    pi(1);
    pi(2);
    pi(8);
    pi(16);
    pi(32);
  }
}
And look at that run-time fall[2] ...
3.141593 1000000000 12.535362 1
3.141593 1000000000 6.488914 2
3.141593 1000000000 4.017939 8
3.141593 1000000000 1.519504 16
3.141593 1000000000 1.522862 32
To move the calculation from a serial mode to a parallel mode took one line of code. That's pretty lovely.

Russel has a vast number of examples in any number of languages in his Github repository, from where I adapted the code above. I can't claim to have read them all but the Java 8 version is one of the easiest to read and comprehend, ahead even of the Clojure, D, and Go examples. That really is remarkable.

1. There is a grep method added to Collection, but that returns another Collection. ^

2. My machine has an i7 with 8 cores and hyperthreading is on, so 16 thread of execution saturates and you'd expect to see a small degradation in speed as you push the number of chunks beyond that.^


Tagged groovy, and java


Jez Higgins

Freelance software generalist
software created
extended or repaired

Older posts are available in the archive or through tags.

Feed

Follow me on Twitter
My code on GitHub

Contact
About