Saturday, May 17, 2008

Debian Language Benchmarks - SumFile

Since someone brought it up the other day on IRC, I figured I'd take a look at some of the Java vs C# benchmarks where Java was claimed to be faster than C#.

Scrolling through the list, there were 3 tests where Java was supposedly more than 2x faster than C# on Mono.

The SumFile test looked fairly simple and I figured the bottleneck there might be related to something silly being done with StandardInput (Miguel replaced the Int.Parse() routines with my more optimal implementations a while back, so I figured that was probably ok).

The first thing I tried to do was reproduce their findings, so I grabbed the Java and C# sources from the Debian site and compiled them.

Both programs read from StandardInput an integer value per line, so I wrote a simple program to generate some input for the test:

#include <stdio.h>
#include <stdlib.h>

int main (int argc, char **argv)
{
    unsigned int max, i;
    int sum = 0;

    max = atoi (argv[1]);

    for (i = 0; i <= max; i++) {
        printf ("%u\n", i);
        sum += i;
    }

    fprintf (stderr, "master value: %d\n", sum);

    return 0;
}

As you can see above, the program outputs the real sum of all the digits it is outputting to stderr so that we can compare the results of the C# and Java programs to make sure they are correct for 32bit signed integers.

Then I compiled the Java and C# versions using the same compile options mentioned on the Debian Language Shootout pages for each language:

javac -classpath '.' sumcol.java

and

gmcs sumcol.cs

I then ran each program using the following commands (using the same java and mono command-line options):

[fejj@moonlight benchmarks]$ time ./output 5000000 | java -server -Xbatch sumcol
master value: 1647668640
2147483647

real    0m4.606s
user    0m5.264s
sys     0m0.276s
[fejj@moonlight benchmarks]$ time ./output 5000000 | mono sumcol.exe
master value: 1647668640
1647668640

real    0m1.415s
user    0m2.136s
sys     0m0.228s

As you can see above, there's no way that Java is 2.3x faster than Mono... so I'm not sure where they are getting these results from. Not only that, but Java is getting the wrong answer!

My best guess is that Java does not allow integer wrapping, so I suppose that that is okay... but it's a bit odd.

For comparison's sake, here's what I get with the c implementation of sumfile:

[fejj@moonlight benchmarks]$ time ./output 5000000 | ./sumcol
master value: 1647668640
1647668640

real    0m1.043s
user    0m1.604s
sys     0m0.188s

Thinking that Java might be taking a performance hit from the bounds checking, I changed my original number-generating program to always print '1' on each line:

#include <stdio.h>
#include <stdlib.h>

int main (int argc, char **argv)
{
    unsigned int i, max;
    int sum = 0;

    max = atoi (argv[1]);

    for (i = 0; i < max; i++) {
        printf ("1\n");
        sum++;
    }

    fprintf (stderr, "master value: %d\n", sum);

    return 0;
}

Running the C, Mono, and Java versions again, I get the following results:

[fejj@moonlight benchmarks]$ time ./output 5000000 | ./sumcol
master value: 5000000
5000000

real    0m0.601s
user    0m0.828s
sys     0m0.032s
[fejj@moonlight benchmarks]$ time ./output 5000000 | mono sumcol.exe
master value: 5000000
5000000

real    0m0.774s
user    0m0.876s
sys     0m0.064s
[fejj@moonlight benchmarks]$ time ./output 5000000 | java -server -Xbatch sumcol
master value: 5000000
5000000

real    0m0.751s
user    0m0.824s
sys     0m0.096s

Here we can see that Java is slightly faster than Mono when comparing the 'real' and 'user' times.

I figured that in order to get a more accurate reading, I should re-run using a larger value, so...

[fejj@moonlight benchmarks]$ time ./output 21474836 | java sumcol
master value: 21474836
21474836

real    0m2.625s
user    0m3.236s
sys     0m0.304s
[fejj@moonlight benchmarks]$ time ./output 21474836 | mono sumcol.exe 
master value: 21474836
21474836

real    0m3.225s
user    0m3.712s
sys     0m0.272s

Again, Java is faster according to 'real' and 'user' times.

What have we learned?

The Java program appears to be slightly faster if and only if we do not overflow the signed integer, otherwise Mono takes the prize.

If we revert back to using the original program I wrote to generate input and re-run using our most recent 'max' value, we find:

[fejj@moonlight benchmarks]$ time ./output 21474836 | java sumcol
master value: 370655678
2147483647

real    0m19.157s
user    0m22.757s
sys     0m0.568s
[fejj@moonlight benchmarks]$ time ./output 21474836 | mono sumcol.exe
master value: 370655678
370655678

real    0m6.004s
user    0m9.513s
sys     0m0.940s

Here we see that Mono clearly outperforms Java, no matter how you slice it.

Conclusion?

You can't compare the performance of the Java and C# programs without first understanding the implications of the input provided to them.

Had the Debian Language Shootout used a different input, the performance tests might have instead shown Java getting raped by Mono for this particular test. Basically, Java just got lucky that the inputs were in their favor.

Update: Oops, I forgot to give specifics of my Mono/Java versions and the specs of my laptop:

Mono: JIT compiler version 1.9 (/trunk/ r103228)

java -version gives the following output:

java version "1.6.0_06"
Java(TM) SE Runtime Environment (build 1.6.0_06-b02)
Java HotSpot(TM) Server VM (build 10.0-b22, mixed mode)

The hardware used was my Lenovo T61 Core2Duo @ 2.4GHz

This was running on OpenSuSE 10.3 with the latest updates.

Update 2: I was pointed to http://shootout.alioth.debian.org/download/sumcol-input.txt to use as input for this SumFile test. Below I've pasted the results:

[fejj@moonlight benchmarks]$ time java -server -Xbatch sumcol < ~/sumcol-input.txt 
500

real    0m0.137s
user    0m0.052s
sys     0m0.028s
[fejj@moonlight benchmarks]$ time mono sumcol.exe < ~/sumcol-input.txt 
500

real    0m0.078s
user    0m0.060s
sys     0m0.012s

Sorry guys, but even with this input, Java is not 2x faster than Mono.

Update 3: Since the above comparisons were all done using the StreamTokenizer version of the Java implementation, I figured I should go back and get the run times of the original Java implementation.

Using my original implementation of output.c, each line having a value 1 larger than the previous line, we get:

[fejj@moonlight benchmarks]$ time ./output 5000000 | java -server -Xbatch sumcol
master value: 1647668640
1647668640

real    0m1.492s
user    0m2.272s
sys     0m0.168s

Interesting note: this Java implementation does not get the wrong answer...

The results for "1" on ever line gives the following results:

[fejj@moonlight benchmarks]$ time ./output 5000000 | java -server -Xbatch sumcol
master value: 5000000
5000000

real    0m0.924s
user    0m0.984s
sys     0m0.176s

Repeating the trial run that totally killed Java's performance with the Tokenizer implementation, we get:

[fejj@moonlight benchmarks]$ time ./output 21474836 | java -server -Xbatch sumcol
master value: 370655678
370655678

real    0m6.564s
user    0m9.893s
sys     0m0.688s

And finally, the results for the sumcol-input.txt, we get:

[fejj@moonlight benchmarks]$ time java -server -Xbatch sumcol < ~/sumcol-input.txt 
500

real    0m0.107s
user    0m0.056s
sys     0m0.032s

Conclusion:

The original Java implementation gets much better results than the StreamTokenizer implementation which was supposedly faster according to the Benchmarks table, however, it's still not 2x faster than the Mono implementation (in fact, this version of the Java implementation is roughly on par with the C# implementation).

Update 4: One of the comments to this blog post requested I try the Java6 SumCol #3 program which includes its own buffering/tokenizer implementation.

Since this program "cheats" a bit, I figured I'd pit it against my own implementation of equal craftyness ;-)

You can find my C# implementation here.

The results are as follows:

[fejj@moonlight benchmarks]$ time ./output 21474836 | mono sumcol2.exe
master value: 370655678
370655678

real    0m4.996s
user    0m4.904s
sys     0m0.676s
[fejj@moonlight benchmarks]$ time ./output 21474836 | java -server -Xbatch sumcol3
master value: 370655678
370655678

real    0m5.117s
user    0m5.748s
sys     0m1.088s

Am I a crafty bugger or what? ;-)

Running these implementations against sumcol-input.txt, I get the following results:

[fejj@moonlight benchmarks]$ time mono sumcol2.exe < ~/sumcol-input.txt 
500

real    0m0.076s
user    0m0.064s
sys     0m0.016s
[fejj@moonlight benchmarks]$ time java -server -Xbatch sumcol3 < ~/sumcol-input.txt 
500

real    0m0.108s
user    0m0.064s
sys     0m0.040s

Damn I'm good! ;-)

Update 5: It's been suggested that the -server and -Xbatch args might make Java slower, so I've redone the tests (this time using sumcol-input.txt repeated 100,000 times)

Here are the results:

[fejj@moonlight benchmarks]$ time java -server sumcol < sumcol-input100000.txt 
50000000

real    0m19.025s
user    0m18.657s
sys     0m0.412s
[fejj@moonlight benchmarks]$ time java sumcol < sumcol-input100000.txt 50000000

real    0m20.643s
user    0m20.269s
sys     0m0.364s
[fejj@moonlight benchmarks]$ time mono sumcol.exe < sumcol-input100000.txt 
50000000

real    0m18.884s
user    0m18.245s
sys     0m0.604s

In the case of Java, it looks like using the -server flag performs a little bit better on this particular test than without.

Overall, the Java and Mono performance for this test are remarkably close.

In the end, I think I can feel pretty confidant that Mono's performance on this test is acceptable and it's not worth wasting any more time on it for now.

Update 6: Okay, one last update to show that a fully managed C# implementation can outperform the fastest C implementation submitted to the Debian Language Benchmarks:

[fejj@moonlight benchmarks]$ time mono sumcol2.exe < sumcol-input100000.txt 
50000000

real    0m2.906s
user    0m2.572s
sys     0m0.308s
[fejj@moonlight benchmarks]$ time ./sumcol < sumcol-input100000.txt 
50000000

real    0m13.967s
user    0m13.277s
sys     0m0.668s

Yes, ladies and gentlemen, that is C# wiping the floor with C.

Post a Comment

Code Snippet Licensing

All code posted to this blog is licensed under the MIT/X11 license unless otherwise stated in the post itself.