Saturday, May 10, 2008

Wouldn't it be nice if...

Over the past week, I've been spending some time hacking on Evolution again because of my frustration with the current IMAP backend. This got me to wondering... why hasn't anyone stepped up to the plate and rewritten Evolution's IMAP code yet?

I think the reason can be summed up with the following 2 problems:

1. IMAP is hard

2. Coding something complicated like a multithreaded multiplexed IMAP client library in C is harder.

That got me and Michael Hutchinson wondering... wouldn't it be nice if we could write Camel provider plugins in C# or in any other managed language that we might prefer?

I think Camel, like Evolution itself, should allow developers to implement plugins in C# as well. I really think this might help lessen the burden of implementing new mail protocol backends for Camel/Evolution.

On that note, I've created a new svn module called camel-imap4 which can be built against your installed evolution-data-server devel packages.

Currently, however, it'll probably only work with e-d-s >= 2.23.x because some things (and assumptions) in Camel have changed recently.

One problem I'm having is that the symbol camel_folder_info_new() used to not exist in older versions of e-d-s, but recently that symbol was added and makes use of g_slice_alloc0(). The problem is that the way providers used to allocate CamelFolderInfo structures before was using g_new0() themselves. Why does this pose a problem? There's no guarantee that I'm aware of that you can mix and match g_malloc/g_slice_free or g_slice_alloc/g_free.

This makes it difficult for me to implement a plugin that builds and works with my installed version of Evolution (2.12) and also work with Evolution svn (2.23). This is quite unfortunate :(

While I'm at it, allow me to also propose some changes to the GChecksum API. Please, please, please make it so that we ned not allocate/free a GChecksum variable each time we need to checksum something?

I propose the ability to do the following:

GChecksum checksum;

g_checksum_init (&checksum, G_CHECKSUM_MD5);
g_checksum_update (&checksum, data, len);
g_checksum_get_digest (&checksum, digest, &len);

Then I'd like to be able to either call g_checksum_init() on checksum again or maybe have another function to clear state, maybe g_checksum_clear() which would allow me to once again use the same checksum variable for calculating the md5 of some other chunk of data.

Camel generates md5sums for a lot of data, sometimes in loops. Having to alloc/free every iteration is inefficient and tedious.

Update: It now builds and works with Evolution 2.12 (I haven't tested anything else). But the new and improved IMAP back-end for Evolution is now actually working. Whoohoo!

43 comments:

Anonymous said...

Hm, not quite sure if I caught the jist of it but is this why my IMAP connections in Evolution are horrible / long-drawn out and even result in it crashing once in awhile?

Oh well, it's free so what can I say :)

Anonymous said...

Might be. I stopped using evolution a while ago because of the crappy IMAP support (slow, crashes). Thunderbird has pretty good IMAP support and it even works with gmail tags.

JrezIN said...

IIRC, http://tinymail.org/ begun from this exact question...

Anonymous said...

What about Vala? Seems it's a much better fit for GTK and GNOME.

I'd prefer to see stuff in Vala instead of C#.

Anonymous said...

Or, what about C++ !, nice, superb OO, designpattern overloaded C++ bindings, throw some boost in the mix and you are almost there... almost I tell you!

Emmanuele said...

jeff, the stack allocation was considered and discarded when I first wrote GChecksum - see bug 443648. main rationale is: you don't want the structure to be public, especially not when we are closing down every public structure in glib and gtk+.

Jeffrey Stedfast said...

emmanuele: okay, fair enough. It'd still be nice to re-use the same instance tho.

Anonymous said...

I like Evolution...mail filters is easy to set up on it compared to Thunderbird etc...

BUT...IMAP...jeez...it forces me to use Thunderbird as their IMAP is just fast.

However I now use Evolution full time with IMAP and is blazing fast?

My solution? I cheat ;-) I use a software called offlineimap which runs in terminal and use Evolution to access Maildir structure that offlineimap creates.

Much easier on my sanity.

Since Thunderbird is also OSS - why can't one take a peek at Thunderbird's IMAP code and see how it could be adapted for Evolution?

Would it be a licencing reason or different language reason or something else?

Cheers

Jeffrey Stedfast said...

One could certainly try to adapt Thunderbird's IMAP code to Evolution, but it'd probably not be worth the trouble - their architecture is significantly different from Evolution's.

Plus they wrote theirs in C++.

As far as Vala goes... why not use a real language like C#? Rather than a meta-language. Plus, if the Camel plugins can be written in any language supported by Mono, it opens up a lot of language options like Python, Ruby, C#, Visual Basic, Boo, Java, Nemerle, F#, etc.

That really lowers the barrier to entry. Adding support for Vala doesn't open it up for anyone new. If they know Vala, they have to know C also. So why even bother?

C++ is ok, but has almost as many problems as C has (more depending on how you count). I've been working a lot with C++ lately and it's really not all it's cracked up to be. C# is so much better designed as a language.

Philip said...

If you write plugins for Evolution in C# or other .NET languages, you're going to have the overhead of the Mono runtime throughout your desktop session.
As Zucchi was saying (http://blogs.gnome.org/zucchi/2008/05/10/linux-is-bloated/), is this really something we want?

Anonymous said...

Vala is a real language and you don't need to know C to write Vala code. The fact that the current implementation uses the C compiler to generate native code is really an implementation detail and independent of the language design.

Anonymous said...

It would be nice if ...

Jeffrey would continue his libspruce IMAP library.

Perhaps (re)design it as a single-threaded state-machine rather than a multi-threaded mutex lock-machine.

Add IDLE / NOTIFY, CONDSTORE/QRESYNC, part fetching, ENVELOPE + BODYSTRUCTURE for the summary - fetching, ...

And then write a blog titled: it's nice that ...

:-)

Anonymous said...

philip: Mono has a lot less overhead than Python. Currently the biggest problem with Mono is a non-compacting GC, but once that gets finished, it'll be the best runtime for writing software for the Linux Desktop without question.

Philip said...

Anonymous: I know Mono has a lower overhead than Python, but that overhead's still there. I've got Tomboy running on my desktop, and it's sitting there taking up 22MiB of writable memory, while other panel applets are only taking ~4MiB.

Jeffrey Stedfast said...

Philip: You are comparing apples and oranges. Tomboy isn't an applet, it is an application that happens to have an icon in the panel. It's more like Pidgin than any other applet.

Comparing Tomboy and Pidgin on my system, Pidgen uses more memory than Tomboy.

Anonymous said...

I think its cool if some really cool plugin is able to be written really fast and easily because its in C#... but for the love of God please don't make it a plugin that Evolution is worthless without. I don't want to run a Mono VM just because I want to check my mail without my client sucking.

Linux used to be great for machines with not so much RAM. C# is great, but can we keep the Gnome stack small, and let 3rd party apps use C#?

pirast said...

regarding imap what about tinymail?

Anonymous said...

tinymail uses Camel which is the Evolution backend.

Philip said...

Jeffrey: Ack, so I am. My apologies.

Anonymous said...

I don't knwo why people complain that often about the overhead of Mono.

For me Python is a much bigger problem.

For example:
Tomboy uses on my system 16,2MB
gajim (my Jabber client) in offline mode: 28,1MB + 8,2MB (Python) = 36,3MB

For me programs written in python are a much bigger "problem" than C#-apps. I can't understand why so many people hype python as _the_ high level programming language for desktop apps. I'm always happy if i can avoid python apps both for general speed issues and because of the memory consumption. If i have a choice between a python app and a c# app i would always go for the c# one.

Unknown said...

have you considered, as a more mid-to-long-term strategy, using akonadi within evolution?

it would be a great way to get more people who actually care about things like getting a good imap implementation, not to mention all the other pieces.

as you can see in Kevin Krammer's blog entry here the work on generalizing (or, if you will, de-KDE-ifying) akonadi is progressing really nicely.

at LCA this year there was even a presentation on akonadi by one of the gnome developers, so hopefully that interest continues.

it's a great area to work together on since things like imap are not sexy for either developers or users; the user interface it's where it's all at, really. the only time the imap implementation becomes interesting at all is when it sucks, so getting everyone working together on these things makes all the sense in the world.

it also helps that there are a number of people paid to work on akonadi so the unsexy nature of the work doesn't interfere completely; also, these people seem to actually enjoy working on it. takes all types, i suppose ;)

but yeah, definitely check out akonadi. integration and improved user experience ftw! ;)

Anonymous said...

Just do a Camel API wrapper for TinyMail.

Anonymous said...

anonymous: tinymail is already just a wrapper around camel. Why would he write a camel wrapper around a wrapper around camel? Makes no sense.

Tinymail uses the same imap code as Evolution.

Anonymous said...

I figured this would happen. Even though right now mono is just an optional part of Gnome, slowly it will creep more and more into core parts of the platform. Slowly people are realizing that C is inadequate for modern desktop development and unfortunately the progression seems to be to C#. Bye bye lightweight. Not for me thank you.

Jeffrey Stedfast said...

Anonymous: You are free to implement a better IMAP provider for Evolution in C if you'd rather avoid C#.

I, myself, am only suggesting the idea because if it comes down to me having to rewrite Evolution's IMAP support, I'd rather do it in C# than in C.

But if you were willing to implement an IMAP backend in C, then I'd gladly step back and let you do it :)

If you just want to complain, well, then you have no say :)

Emmanuele said...

@jeff: mmh, perhaps a g_checksum_clear() that just reset the internal state and gave you a newly usable GChecksum. would that be enough?

Jeffrey Stedfast said...

emmanuele: Yea, altho in the feature request I submitted to bugzilla - I called it g_checksum_reset() - I think that might be a better name?

I can certainly live with having to malloc/free a GChecksum instead of having it be on the stack, but a g_checksum_reset() would be awesome ;-)

Jeffrey Stedfast said...

Emmanuele: See http://bugzilla.gnome.org/show_bug.cgi?id=532552

I've already written a patch :)

Anonymous said...

It would really be a shame to make evolution depend on bloatware like Mono after the efforts that have been spent in making it consume less memory.

As for a Camel provider written in C#, you'd be unable to use it in any serious environment. Code able to manage thousands of emails in hundreds of folders should be written in C, full stop.

As for those comparing the memory usage of python and mono: please also compare startup times. Tomboy takes up to 10 seconds to startup on my laptop, during which gnome-panel is unusable. I have yet to see a program in another language (except maybe Java) being so slow to start.

Jeffrey Stedfast said...

np237: you are vastly mistaken about Mono being bloatware.

C# can outperform native C depending on what you are doing.

Also, IMAP is I/O bound, so you're not going to notice any slowness of something written in C# vs C (if you were worth your weight as a programmer, you'd recognize this fact).

Instead of broadcasting your ignorance to the world, it might be wise for you to do some actual learning.

Jeffrey Stedfast said...

As an example of Mono being able to outperform native C, Mono's new RegularExpressions implementation crushes the C, C++, Java, Perl, Python, etc (all except Tcl) implementations in the Debian Language Shootout.

Oh, and when I say it outperforms C... it outperforms both libc regex and PCRE.

Philip said...

http://shootout.alioth.debian.org/debian/benchmark.php?test=all&lang=csharp&lang2=gcc

I'm missing something, aren't I?

Jeffrey Stedfast said...

Philip: the tests there haven't been updated with the new Regex stuff (which I don't think is even enabled by default in Mono yet because it isn't complete, I don't think)

Jeffrey Stedfast said...

The new Regex engine was written during Novell HackWeek which Miguel posted a summary about here:

http://tirania.org/blog/archive/2008/Feb-23.html

Leo S said...

@Jeffrey
The performance argument comes up again and again. Yes JIT compiled languages can be faster than C in some cases. But in the general case, they are a lot slower. You know this, so why bring it up?

Also claiming that C# regexp is blazingly fast, but only in an incomplete version that isn't even in mainline mono yet is silly. Perhaps it's fast because it's incomplete. You can't really make any judgment until it's released.

We've heard the "it'll be faster than C" argument since 95 when Java was released. It was pie in the sky back then, and after 13 years of development it still is. Those languages have plenty of advantages, but performance is not one of them.

Jeffrey Stedfast said...

I'll wager that a C# backend, if properly designed, could outperform the current IMAP backend that is written in C and have more functionality.

Philip said...

"I'll wager that a C# backend, if properly designed, could outperform the current IMAP backend that is written in C and have more functionality."

Emphasis mine. The issue here is with the design, not the programming language. With a better design, I'd say the C backend could outperform a better-designed C# backend.

Jeffrey Stedfast said...

If it outperforms the current C backend, then it's still a win - no matter what language it's written in ;)

Anonymous said...

...And a properly designed assembly implementation could outperform a properly designed C implementation.

What's your point?

Unless the performance is noticeable to the person using it, it's not an issue.

If writing it in C# means it gets implemented faster, more robustly, and is more easily maintainable than a C version, then why not a C# implementation?

People have been getting along fine for 7+ years with the current IMAP implementation, so if a C# one comes along that is just as fast and yet more featureful, implements IDLE, etc then why not?

Anonymous said...

Jeffrey: insulting people is not going to make your point.

Tomboy takes 10 seconds to start up and needs ~30 MB to load a score of notes. Evolution takes 2 seconds to startup and needs ~130 MB to load about 20000 emails.

I have looked closely at Tomboy’s code. I know it’s not badly designed. But this is what happens with high-level languages. They are not bad per se, but they are slow and take memory. This is also true for python, and even knowing that I use python a lot. But I would NEVER use it for something that needs to be memory-efficient and close to the network protocol.

Mono plugins for Evolution? I cool with the idea. Making the core of Evolution use Mono for its most intensive tasks? Come on.

I know it can be hard to understand that your favorite operating system/programming language/text editor/desktop environment is not the best at everything, but it is a lesson that everyone needs to learn at some point.

Jeffrey Stedfast said...

np273:

I never said that an IMAP implementation in C# wouldn't use more memory than one in C.

Please reread what I said and then read your own response to what I said and notice that your response doesn't at all rebut what I had said.

However...

- Tomboy's startup time is irrelevant to the discussion.

- IMAP is not something that needs to be super memory efficient, using an extra meg or two isn't going to make a damn bit of difference on the desktop. It's worth the extra meg or two in order to have code that is far more maintainable and easier to write in the first place.

Should you later discover a performance critical section of the managed code, you could always go back and replace it with a P/Invoke into some native code.

I don't see this being an issue with IMAP, though, nothing is particularly time critical where shaving off a few microseconds somewhere is really going to make a difference.

Allow me to let you in on a little secret involving the current IMAP backend in Evolution.

The big performance suckage in the current IMAP backend has little to do with it not having optimized routines and everything to do with the design being poor... YET most people are happy with the performance so long as they don't have hundreds of folders with thousands of messages in each.

- The current IMAP code makes very expensive IMAP queries more than it needs to.

- There is a different parser for each command, all of which load the entire server response into memory before parsing.

A proper implementation (in C or C#) would be designed to avoid these issues, and, whether it being in C or C#, would likely use less memory than the current implementation and at the same time most assuredly be faster.

Jeffrey Stedfast said...

...not to mention more scalable.

Jeffrey Stedfast said...

fwiw, it might be interesting for those claiming C# can't match C performance for parsing and I/O to read my Debian Language Shootout post on SumFile which has a C# implementation where even a highly optimized C implementation can barely outperform (and if I bothered to use pointers in C#, could probably match it).

An IMAP backend for Evolution would be 90% parsing input and writing output to a socket, so if I/O performance can match C and if the programmer used byte arrays instead of converting input buffers to char arrays (which would likely also incur charset conversion overhead, or at the very least an O(n) byte-to-char conversion), then there's no reason a C# backend couldn't have comparable performance to one in C.

Code Snippet Licensing

All code posted to this blog is licensed under the MIT/X11 license unless otherwise stated in the post itself.