What IronRuby Running Rails *REALLY* Means and Why Miguel de Icaza Deserves The Credit
Grats to Miguel ;-)
I've been continuing to make improvements to the API over the past few days and have just made a 2.3.1 release.
The main focus of this release has been a redesign of the header API which has been nagging at me for quite a while now. To fix it, I've done the following:
The GMimeHeaderIter API looks like this:
GMimeHeaderIter *g_mime_header_iter_copy (GMimeHeaderIter *iter); void g_mime_header_iter_free (GMimeHeaderIter *iter); bool g_mime_header_iter_equal (GMimeHeaderIter *iter1, GMimeHeaderIter *iter2); bool g_mime_header_iter_is_valid (GMimeHeaderIter *iter); bool g_mime_header_iter_first (GMimeHeaderIter *iter); bool g_mime_header_iter_last (GMimeHeaderIter *iter); bool g_mime_header_iter_next (GMimeHeaderIter *iter); bool g_mime_header_iter_prev (GMimeHeaderIter *iter); gint64 g_mime_header_iter_get_offset (GMimeHeaderIter *iter); const char *g_mime_header_iter_get_name (GMimeHeaderIter *iter); bool g_mime_header_iter_set_value (GMimeHeaderIter *iter, const char *value); const char *g_mime_header_iter_get_value (GMimeHeaderIter *iter); /* if the iter is valid, removes the current header and advances to the next - returns TRUE on success or FALSE otherwise */ bool g_mime_header_iter_remove (GMimeHeaderIter *iter);
Currently, the way to get a GMimeHeaderIter is to call g_mime_header_list_get_iter() which returns a newly allocated iter. I'm not sure if this is the best API or not tho... some other thoughts I've had are:
GMimeHeaderIter *g_mime_header_iter_new (void); bool g_mime_header_list_get_iter (GMimeHeaderList *list, GMimeHeaderIter *iter);
The second function would initialize iter and return TRUE, or, if list was empty, it could return FALSE.
Another option would be to just have:
GMimeHeaderIter *g_mime_header_iter_new (GMimeHeaderList *list);
Then, if list is empty you'd just get back an invalidated iter. If, later, headers were added to list, then perhaps the iter could auto-magically become valid again. This would more-or-less work the same as the current code - except that the way to instantiate a GMimeHeaderIter is different.
To be honest, now that I've written these 2 ideas down, I think I prefer the first idea.
So... iterators. Since you can have multiple iterators for the same GMimeHeaderList, it is important to invalidate them on at least two occasions:
You'll be pleased to note that only iters that reference a header that got removed are invalidated. This fact seems to be an argument in favor of my first idea above, as it would allow re-use of iters once they get invalidated.
GMime 2.3.0 has been released which marks the first development release of GMime since 2002. A new age has begun.
This new development release gets rid of off_t from all public interfaces, replacing them with gint64 instead so that there is no API/ABI breakage when building it with Large File support. But more importantly (to me), it also fixes a number of the API inconsistencies/uglyness that I discovered while looking over the GMime-Sharp bindings.
The GMime-Sharp bindings still have a long way to go to be beautiful, but they are becoming more of a priority as I am more interested in using GMime-Sharp myself. If you are a user of the GMime-Sharp bindings, I'd love for you to try out GMime 2.3.0 and give me some feedback and how to improve the API even more. I, myself, will continue to scrutinize the API and make adjustments as I go - in the end, I hope to have the most beautiful MIME .NET API ever created.
At first, I was only going to fix GMime's public interfaces to use gint64 instead of off_t and save the rest of my changes for a "3.0" or at least a "2.6", but I decided that since I was breaking the ABI anyway - I might as well do some cleanup and I think developers worldwide will thank me after their initial pain of porting their software to the new API (for which I hope to write a script to help in their troubles).
Largely these changes are renaming of functions, sometimes shuffling them to become methods on a base class, etc.
I got rid of GMimePartEncodingType and created a GMimeContentEncoding enum instead, complete with a g_mime_content_encoding_to_string() and g_mime_content_encoding_from_string(). I also created a new state object called GMimeEncoding (may rename to GMimeEncoder?) which simplifies the base64/QuotedPrintable/UU encoder/decoder APIs a little in that it handles initializing the state variables for you as you encode or decode to any of those encodings.
As I was updating code to use these new enums/structs, I was able to simplify GMimeFilterBasic to not need its own GMimeFilterBasicType struct - instead, you pass the encoding and a bool to encode vs decode to the new g_mime_filter_basic_new() constructor. This has helped simplify a lot of code inside GMime and I'm sure it will help simply everyone elses code as well, because there is no longer a need to map GMIME_PART_ENCODING_BASE64 to GMIME_FILTER_BASIC_TYPE_BASE64_ENC or GMIME_FILTER_BASIC_TYPE_BASE64_DEC.
I've also fixed GMimeContentType and GMimeContentDisposition to notify their parent GMimeObject when they are changed so that the following C#ism becomes possible:
part.ContentType.SetParameter ("name", "fubar.txt");
Previously, you'd have to use the GMimeObject's Content-Type helper methods instead of using the Content-Type's methods directly, like so:
part.SetContentTypeParameter ("name", "fubar.txt");
There are quite a few other improvements like this that I've added to the GMime-Sharp API as well. Sure, they may be minor, but they make the API much more polished.
And here it is, your Moment of Zen:
Those who assume the worst in other people, are by their very nature among the worst people.
Those who assume the best in other people, are by their very nature among the best people.
-- Jeffrey Stedfast
Since someone brought it up the other day on IRC, I figured I'd take a look at some of the Java vs C# benchmarks where Java was claimed to be faster than C#.
Scrolling through the list, there were 3 tests where Java was supposedly more than 2x faster than C# on Mono.
The SumFile test looked fairly simple and I figured the bottleneck there might be related to something silly being done with StandardInput (Miguel replaced the Int.Parse() routines with my more optimal implementations a while back, so I figured that was probably ok).
The first thing I tried to do was reproduce their findings, so I grabbed the Java and C# sources from the Debian site and compiled them.
Both programs read from StandardInput an integer value per line, so I wrote a simple program to generate some input for the test:
#include <stdio.h> #include <stdlib.h> int main (int argc, char **argv) { unsigned int max, i; int sum = 0; max = atoi (argv[1]); for (i = 0; i <= max; i++) { printf ("%u\n", i); sum += i; } fprintf (stderr, "master value: %d\n", sum); return 0; }
As you can see above, the program outputs the real sum of all the digits it is outputting to stderr so that we can compare the results of the C# and Java programs to make sure they are correct for 32bit signed integers.
Then I compiled the Java and C# versions using the same compile options mentioned on the Debian Language Shootout pages for each language:
javac -classpath '.' sumcol.java
and
gmcs sumcol.cs
I then ran each program using the following commands (using the same java and mono command-line options):
[fejj@moonlight benchmarks]$ time ./output 5000000 | java -server -Xbatch sumcol master value: 1647668640 2147483647 real 0m4.606s user 0m5.264s sys 0m0.276s [fejj@moonlight benchmarks]$ time ./output 5000000 | mono sumcol.exe master value: 1647668640 1647668640 real 0m1.415s user 0m2.136s sys 0m0.228s
As you can see above, there's no way that Java is 2.3x faster than Mono... so I'm not sure where they are getting these results from. Not only that, but Java is getting the wrong answer!
My best guess is that Java does not allow integer wrapping, so I suppose that that is okay... but it's a bit odd.
For comparison's sake, here's what I get with the c implementation of sumfile:
[fejj@moonlight benchmarks]$ time ./output 5000000 | ./sumcol master value: 1647668640 1647668640 real 0m1.043s user 0m1.604s sys 0m0.188s
Thinking that Java might be taking a performance hit from the bounds checking, I changed my original number-generating program to always print '1' on each line:
#include <stdio.h> #include <stdlib.h> int main (int argc, char **argv) { unsigned int i, max; int sum = 0; max = atoi (argv[1]); for (i = 0; i < max; i++) { printf ("1\n"); sum++; } fprintf (stderr, "master value: %d\n", sum); return 0; }
Running the C, Mono, and Java versions again, I get the following results:
[fejj@moonlight benchmarks]$ time ./output 5000000 | ./sumcol master value: 5000000 5000000 real 0m0.601s user 0m0.828s sys 0m0.032s [fejj@moonlight benchmarks]$ time ./output 5000000 | mono sumcol.exe master value: 5000000 5000000 real 0m0.774s user 0m0.876s sys 0m0.064s [fejj@moonlight benchmarks]$ time ./output 5000000 | java -server -Xbatch sumcol master value: 5000000 5000000 real 0m0.751s user 0m0.824s sys 0m0.096s
Here we can see that Java is slightly faster than Mono when comparing the 'real' and 'user' times.
I figured that in order to get a more accurate reading, I should re-run using a larger value, so...
[fejj@moonlight benchmarks]$ time ./output 21474836 | java sumcol master value: 21474836 21474836 real 0m2.625s user 0m3.236s sys 0m0.304s [fejj@moonlight benchmarks]$ time ./output 21474836 | mono sumcol.exe master value: 21474836 21474836 real 0m3.225s user 0m3.712s sys 0m0.272s
Again, Java is faster according to 'real' and 'user' times.
What have we learned?
The Java program appears to be slightly faster if and only if we do not overflow the signed integer, otherwise Mono takes the prize.
If we revert back to using the original program I wrote to generate input and re-run using our most recent 'max' value, we find:
[fejj@moonlight benchmarks]$ time ./output 21474836 | java sumcol master value: 370655678 2147483647 real 0m19.157s user 0m22.757s sys 0m0.568s [fejj@moonlight benchmarks]$ time ./output 21474836 | mono sumcol.exe master value: 370655678 370655678 real 0m6.004s user 0m9.513s sys 0m0.940s
Here we see that Mono clearly outperforms Java, no matter how you slice it.
Conclusion?
You can't compare the performance of the Java and C# programs without first understanding the implications of the input provided to them.
Had the Debian Language Shootout used a different input, the performance tests might have instead shown Java getting raped by Mono for this particular test. Basically, Java just got lucky that the inputs were in their favor.
Update: Oops, I forgot to give specifics of my Mono/Java versions and the specs of my laptop:
Mono: JIT compiler version 1.9 (/trunk/ r103228)
java -version gives the following output:
java version "1.6.0_06"
Java(TM) SE Runtime Environment (build 1.6.0_06-b02)
Java HotSpot(TM) Server VM (build 10.0-b22, mixed mode)
The hardware used was my Lenovo T61 Core2Duo @ 2.4GHz
This was running on OpenSuSE 10.3 with the latest updates.
Update 2: I was pointed to http://shootout.alioth.debian.org/download/sumcol-input.txt to use as input for this SumFile test. Below I've pasted the results:
[fejj@moonlight benchmarks]$ time java -server -Xbatch sumcol < ~/sumcol-input.txt 500 real 0m0.137s user 0m0.052s sys 0m0.028s [fejj@moonlight benchmarks]$ time mono sumcol.exe < ~/sumcol-input.txt 500 real 0m0.078s user 0m0.060s sys 0m0.012s
Sorry guys, but even with this input, Java is not 2x faster than Mono.
Update 3: Since the above comparisons were all done using the StreamTokenizer version of the Java implementation, I figured I should go back and get the run times of the original Java implementation.
Using my original implementation of output.c, each line having a value 1 larger than the previous line, we get:
[fejj@moonlight benchmarks]$ time ./output 5000000 | java -server -Xbatch sumcol master value: 1647668640 1647668640 real 0m1.492s user 0m2.272s sys 0m0.168s
Interesting note: this Java implementation does not get the wrong answer...
The results for "1" on ever line gives the following results:
[fejj@moonlight benchmarks]$ time ./output 5000000 | java -server -Xbatch sumcol master value: 5000000 5000000 real 0m0.924s user 0m0.984s sys 0m0.176s
Repeating the trial run that totally killed Java's performance with the Tokenizer implementation, we get:
[fejj@moonlight benchmarks]$ time ./output 21474836 | java -server -Xbatch sumcol master value: 370655678 370655678 real 0m6.564s user 0m9.893s sys 0m0.688s
And finally, the results for the sumcol-input.txt, we get:
[fejj@moonlight benchmarks]$ time java -server -Xbatch sumcol < ~/sumcol-input.txt 500 real 0m0.107s user 0m0.056s sys 0m0.032s
Conclusion:
The original Java implementation gets much better results than the StreamTokenizer implementation which was supposedly faster according to the Benchmarks table, however, it's still not 2x faster than the Mono implementation (in fact, this version of the Java implementation is roughly on par with the C# implementation).
Update 4: One of the comments to this blog post requested I try the Java6 SumCol #3 program which includes its own buffering/tokenizer implementation.
Since this program "cheats" a bit, I figured I'd pit it against my own implementation of equal craftyness ;-)
You can find my C# implementation here.
The results are as follows:
[fejj@moonlight benchmarks]$ time ./output 21474836 | mono sumcol2.exe master value: 370655678 370655678 real 0m4.996s user 0m4.904s sys 0m0.676s [fejj@moonlight benchmarks]$ time ./output 21474836 | java -server -Xbatch sumcol3 master value: 370655678 370655678 real 0m5.117s user 0m5.748s sys 0m1.088s
Am I a crafty bugger or what? ;-)
Running these implementations against sumcol-input.txt, I get the following results:
[fejj@moonlight benchmarks]$ time mono sumcol2.exe < ~/sumcol-input.txt 500 real 0m0.076s user 0m0.064s sys 0m0.016s [fejj@moonlight benchmarks]$ time java -server -Xbatch sumcol3 < ~/sumcol-input.txt 500 real 0m0.108s user 0m0.064s sys 0m0.040s
Damn I'm good! ;-)
Update 5: It's been suggested that the -server and -Xbatch args might make Java slower, so I've redone the tests (this time using sumcol-input.txt repeated 100,000 times)
Here are the results:
[fejj@moonlight benchmarks]$ time java -server sumcol < sumcol-input100000.txt 50000000 real 0m19.025s user 0m18.657s sys 0m0.412s [fejj@moonlight benchmarks]$ time java sumcol < sumcol-input100000.txt 50000000 real 0m20.643s user 0m20.269s sys 0m0.364s [fejj@moonlight benchmarks]$ time mono sumcol.exe < sumcol-input100000.txt 50000000 real 0m18.884s user 0m18.245s sys 0m0.604s
In the case of Java, it looks like using the -server flag performs a little bit better on this particular test than without.
Overall, the Java and Mono performance for this test are remarkably close.
In the end, I think I can feel pretty confidant that Mono's performance on this test is acceptable and it's not worth wasting any more time on it for now.
Update 6: Okay, one last update to show that a fully managed C# implementation can outperform the fastest C implementation submitted to the Debian Language Benchmarks:
[fejj@moonlight benchmarks]$ time mono sumcol2.exe < sumcol-input100000.txt 50000000 real 0m2.906s user 0m2.572s sys 0m0.308s [fejj@moonlight benchmarks]$ time ./sumcol < sumcol-input100000.txt 50000000 real 0m13.967s user 0m13.277s sys 0m0.668s
Yes, ladies and gentlemen, that is C# wiping the floor with C.
Over the past week, I've been spending some time hacking on Evolution again because of my frustration with the current IMAP backend. This got me to wondering... why hasn't anyone stepped up to the plate and rewritten Evolution's IMAP code yet?
I think the reason can be summed up with the following 2 problems:
1. IMAP is hard
2. Coding something complicated like a multithreaded multiplexed IMAP client library in C is harder.
That got me and Michael Hutchinson wondering... wouldn't it be nice if we could write Camel provider plugins in C# or in any other managed language that we might prefer?
I think Camel, like Evolution itself, should allow developers to implement plugins in C# as well. I really think this might help lessen the burden of implementing new mail protocol backends for Camel/Evolution.
On that note, I've created a new svn module called camel-imap4 which can be built against your installed evolution-data-server devel packages.
Currently, however, it'll probably only work with e-d-s >= 2.23.x because some things (and assumptions) in Camel have changed recently.
One problem I'm having is that the symbol camel_folder_info_new() used to not exist in older versions of e-d-s, but recently that symbol was added and makes use of g_slice_alloc0(). The problem is that the way providers used to allocate CamelFolderInfo structures before was using g_new0() themselves. Why does this pose a problem? There's no guarantee that I'm aware of that you can mix and match g_malloc/g_slice_free or g_slice_alloc/g_free.
This makes it difficult for me to implement a plugin that builds and works with my installed version of Evolution (2.12) and also work with Evolution svn (2.23). This is quite unfortunate :(
While I'm at it, allow me to also propose some changes to the GChecksum API. Please, please, please make it so that we ned not allocate/free a GChecksum variable each time we need to checksum something?
I propose the ability to do the following:
GChecksum checksum;
g_checksum_init (&checksum, G_CHECKSUM_MD5);
g_checksum_update (&checksum, data, len);
g_checksum_get_digest (&checksum, digest, &len);
Then I'd like to be able to either call g_checksum_init() on checksum again or maybe have another function to clear state, maybe g_checksum_clear() which would allow me to once again use the same checksum variable for calculating the md5 of some other chunk of data.
Camel generates md5sums for a lot of data, sometimes in loops. Having to alloc/free every iteration is inefficient and tedious.
Update: It now builds and works with Evolution 2.12 (I haven't tested anything else). But the new and improved IMAP back-end for Evolution is now actually working. Whoohoo!