Sunday, February 3, 2008

Worse is Better in the form of Autosave

There's recently been some talk about how GLib is poorly designed software because g_malloc() abort()s when the underlying malloc implementation returns NULL (suggesting an OOM condition) and therefor should never be used to write real-world applications because your calling code doesn't have a chance to do proper error checking (altho it was brought up that you can actually use g_try_malloc() and/or plug in your own malloc implementation underneath g_malloc() which could trivially notify the application that an OOM condition was met before returning to g_malloc()).

While at first, this argument seems correct and you begin to think "oh my god, the sky is falling", it's important to actually stop and think about the issue a bit first.

GLib was originally a utility library written as part of the Gtk+ widget toolkit in order to make their lives easier. When designing a widget toolkit (Gtk+ in this case) for real-world programmers to use, simplicity is key. If your widget toolkit is hard to use because it offers a way to notify the application developer of every conceivable error condition, then no one would use it because it would be "too hard".

What good is a library that is too hard to use that nobody uses it? It's no good, that's what.

The problem is that the idealists complaining about glib's g_malloc() have only considered being able to check g_strdup()'s return for NULL or maybe as far as gtk_foo_new() and have not considered that the render pipeline may require allocating memory as it renders the widget which might not have an ideal way to pass the OOM error back up to the application because the top of the call stack may in fact be gtk_main() and not some callback function implemented in the application's code itself.

The idealists argue that without the ability to check every malloc() for a NULL return and chain it back up to a high enough level in the call stack to handle properly means that users could lose their unsaved document if the application is, for example, a word processor. They argue that a properly designed application will always properly handle all error conditions and pass errors back up the stack where an emergency buffer can be used to show the user an "Out of memory" error dialog and/or save all of the user's unsaved work.

The problem with this school of thought is that the simple act of rendering your error dialog may require memory that you do not have available (if we are truly OOM as opposed to simply being unable to allocate that ~4GB buffer that the application tried to allocate due to poor arithmetic).

There is one thing that they are correct on, however, and that is that losing a user's document is a Bad Thing(tm).

What they haven't considered, however, is that it's possible to prevent data loss without the need to implement their really complex OOM handling code:

I dub thee, auto-save.

Yes, I will assert that auto-save is our savior and that it is, in fact, the only feasible solution in what I affectionately refer to as: The Real World.

In the Real World, applications are built on top of other people's code that you do not control and do not have time nor luxury to audit, you simply have to trust that they work as advertised.

Let's imagine, for a minute, that you write a word processor application using some toolkit (other than Gtk+, obviously) that upholds your idealist design principles in that it is designed in such a way as to be able to notify your word processor application about OOM conditions that it experienced way down in the deep dark places of the rendering pipeline. And let's, for argument's sake, assume that your application is flawlessly written - because you are an idealist and thus your code is perfectly implemented in all aspects, obviously.

Now imagine that a user is using your word processor application and the version of the widget toolkit (s)he's using has a bug in some error handling case that is unexpectedly hit and the error doesn't properly get passed up the call stack, but instead crashes the application because the toolkit's bug corrupts some memory.

Oops.

All the hard work you did, making sure that every possible error condition in your code properly handles the error, never gets hit because the application crashed in a library you trusted to be implemented flawlessly, so the user loses his/her document that they were writing.

Your effort was all for naught in this particular case.

What's the solution? Auto-save.

What have we learned from this? Auto-save is needed even if your toolkit is sufficiently designed to pass all errors (including OOM errors) back up the stack.

Once you've implemented auto-save, though, what are all those custom OOM-checks for each and every malloc() call in your application really worth?

Zilch.

So why not use something like g_malloc() at this point?

Once your system is OOM, the only reasonable thing you can do is save any state you don't already have saved and then abort the application (not necessarily using the abort() call). But if you already have all your important state pre-saved, then all you have left to do is shut down the application (because you don't have enough memory resources to continue running).

Where does Worse is Better come in, you ask?

Well, arguably, the auto-save approach isn't as ideal as implementing proper fallback code for every possible error condition.

Auto-save is, however, Better because it works in the Real World and is Good Enough in that it achieves the goal of preventing the loss of your user's document(s) and because it is far easier to implement with a lot fewer points of failure.

Fewer points of failure means that it is a lot more likely to work properly. By using the auto-save approach, you can focus on making that code robust against every conceivable error condition with far less developer time and resources meaning you are able to keep both cost and data loss down which makes everyone happy.

Post a Comment

Code Snippet Licensing

All code posted to this blog is licensed under the MIT/X11 license unless otherwise stated in the post itself.