Home > Uncategorized > Programming and Garbage Collection

Programming and Garbage Collection

There are two managed code languages that I’ve programmed in with garbage collectors. Java and C# (.Net). In my basic understanding the Java garbage collector has a thread that runs periodically, checks the reference count for all of the objects and frees up the memory for objects which are unused. The .Net garbage collector is a bit more sophisticated (or bloated, depending upon your point of view). I am much more familiar with the .Net garbage collector, primarily because that’s what I’ve been working with for the last three years. I have to say that I like it. The .Net garbage collector has a Gen 0, Gen 1, Gen 2 and a large object heap. The idea is that most objects are small, and large objects are expensive to move around in memory. There is a dedicated finalizer thread with runs periodically and cleans up all of the objects that it can in the Gen 0 heap. If an object can’t be cleaned up that object gets bumped into the Gen 1 heap. Gen 1 collections happen less frequently than Gen 2, and Gen 2 collections happen less frequently than Gen 1, and large object heap collections happen less frequently than Gen 2. This system was designed under the idea that objects can be created quickly and then thrown away. This is awesome. There’s less of a worry that short lived objects will be actually taking up memory for significantly more time than they should.

So while the idea behind the .Net garbage collector is for objects to be created and thrown away quickly it is still doesn’t mean that a good programmer will allocate objects with wild abandon. For example, you create a method which creates a lot of small objects, calls some methods and then exits. The method even recreates identical objects because it was easier to recreate them than creating and managing a collection of objects. You test and profile your method and everything is happy. All of the objects got cleaned up in Gen 0, and nothing made it to Gen 1. Now let’s say that months go by and all of a sudden one day your profiling tests notice that there’s a lot more objects in the Gen 1 or Gen 2 heap. What happened? Nothing changed in your method. Perhaps another method you were calling was changed and started to do I/O, or perhaps started creating a lot of small, throw away objects itself. Because the objects allocated in your method lived through the call to the sub method (and now the sub method is generating a garbage collection) your objects are being considered longer lived objects and are being moved around the different heaps. So while there’s forgiveness for one layer of code to be wasteful there’s not forgiveness for multiple layers to be wasteful.

Managing memory is still something programmers should do in managed code. They just have to think about it differently than they would compared to native code. While the system was designed for throw away objects it doesn’t mean that you should program for that. It’s also important to note that while .Net made allocating objects cheap, it’s still not free. The three common ways programmers get caught in not managing their objects are string allocation, anonymous delegates and registered events.

Creating string objects in .Net is really easy. Concatenating strings in .Net is really easy. It’s a temptation for some to create strings by concatenating lots of strings together. I knew one programmer who would do it for strings which never even got used (in a code review I was told I was being Penny wise, but Pound foolish. The problem though was that while I was being Penny wise nothing in the code review was Pound foolish). The two ways to avoid this are to use the StringBuilder object or the String.Format() method. The one to use depends upon the situation. The other thing is that you should never create a string that’s never going to be used. This seems obvious, but it tends to happen a lot with strings and not other objects. A case of this would be trace statements. Don’t create a message for a trace statement when traces aren’t enabled.

Anonymous delegates are really cool and really powerful and can make for great eye candy (if your a programmer). They are absolutely vital for Linq. One of the things that’s nice about them is they do a lot of the tedious work for you when it comes to using delegates. The problem is that sometime you don’t know how much it had to do to get done what you wanted done anonymously. Under certain circumstances the anonymous delegate will create new classes and instantiate new instances of those classes. There’s just no other way for it to do what you wanted to do. But there are things that you can do in your code to create named delegates on existing classes which will take up less memory and use less objects.

Worrying about strings and anonymous delegates are issues were a heuristic should be used in how much you worry about them. If you take the easy way here or there, in a code path that’s rarely executed it might even be preferable to do it because there will never be a measurable impact on the system and it was quicker to code. But if the code ends up in a tight loop or frequently called code path, lazy programming can kill you. And by kill you, I mean users of your program will call the program slow, bloated and unresponsive. I prefer programming in the conscious of what I’m doing, so that I get in good habits and perhaps my code might end up at the bottom of somebody else’s stack one day.

On the other hand registered events can kill you if you do it wrong once. This trap hasn’t really affected me because I’m a big fan of the Asynchronous Programming Model instead of the Event Driven Asynchronous Model. The only thing that requires event driven is the GUI thread. Windows programs which have window frames have a dedicated GUI thread. Some consider this a feature, most consider it a mistake. I am in the later party. What happens is that most classes which deal with classes for which you have to register an event don’t unregister the event. It’s possible to, it’s built into the language to, but programmers forget about it all of the time. So what ends up happening is an object gets created, registers for events, stops getting used, would be garbage collected, but still has references through the registered events. As a result many Windows programs will take up more memory the more window frames get opened because programmers didn’t unregister the events.

In all of this remember Amdahl’s law and do what’s best for your program. Code what you mean and mean what you code.

Categories: Uncategorized
  1. Yiru
    November 21, 2009 at 8:09 pm

    I like this one "Code what you mean and mean what you code."

  2. Jared
    November 21, 2009 at 9:51 pm

    I thought I was quoting http://blogs.msdn.com/oldnewthing/archive/2006/08/04/688527.aspx when I wrote that, but apparently the original quote is "Write what you mean and mean what you write." Is it plagiarism if I misremembered?

  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: