Why We Dispose Things

Pop quiz: Why do we use IDisposable?

If you said something like “To allow us to clean up unmanaged resources”, I have good news: most other people make the same mistake.

The correct answer is “To make our code faster and more predictable.”

Don’t believe me? Let me try to convince you.

Consider the following:

void Main()
{
     var thing = new Thing();
     GC.Collect();                  // It doesn't matter what you do
     GC.WaitForFullGCComplete();    // Or how long you wait
     GC.WaitForPendingFinalizers(); // Thing will never release its resource
}

public class Thing : IDisposable {
      public object FakeResource = new object();
      public void Dispose() {
            // Do not implement the Disposable pattern this way!
           FakeResource = null;
           "Resource released!".Dump();     // This never happens
     }
}

It doesn’t matter how thoroughly you implement IDisposable. If somebody using your code fails to call the Dispose() method (or wrap your object in a using block), your resources will never be released. If you’re looking to ensure your resources are released, you should implement a finalizer:

void Main()
{
     var thing = new Thing();
}

public class Thing {
     public object FakeResource = new object();
     ~Thing() {
           FakeResource = null;
           "Resource released!".Dump();
     }
}

This guarantees that our resource will be released – however, it doesn’t guarantee when it will be released. In fact, when I ran that code, the first two runs didn’t print anything, and the third run printed the message twice. The fourth run printed the message twice again (LINQPad doesn’t unload the app domain between runs, so we see the finalizers from earlier runs completing during later runs.)

What you should see from this is that IDisposable isn’t for disposing resources. One of the uses of IDisposable is, however, to provide some control over when those resources are released. A basic pattern you might use is this one:

public class Thing : IDisposable {
     public object FakeResource = new object();
     ~Thing() {
           releaseResources();
     }
     public void Dispose() {
           releaseResources();    // This still isn't the full pattern you should be using
     }
      private void releaseResources() {
           if (FakeResource != null) {
                FakeResource = null;
                 "Resource released!".Dump();
           }
     }
}

Now, if a Thing is wrapped in a using block, or Dispose() is called, the resource will be released immediately. If the caller fails to ensure Dispose() is called, the resource will still be released by the finalizer.

Hopefully you can see that a finalizer is what we should be using to ensure resources are released, and IDisposable gives us a way to control when that happens. This is what I meant about predictability, and it also improves our stability: if resources are cleaned up in a timely fashion, our system is less likely to run out of limited resources under heavy load. If we rely on the finalizer, we guarantee that the resource will be released, but it’s possible for large numbers of objects to be waiting to be finalized, while hanging onto resources which won’t be used again.

Performance

I promised that IDisposable can also make code run faster, and to do that we need to understand a little bit about the garbage collector.

In the CLR, our heap has three different generations, numbered 0, 1, and 2. Objects are initially allocated on the gen 0 heap, and are moved up to the gen 1 and 2 heaps as they last longer.

The garbage collector needs to make a fast decision about every object, and so every time it encounters an object during a collection, it does one of two things: collect the object, or promote it to the next generation. This means that if your object survives a single gen 0 garbage collection, it will be moved onto the gen 1 heap by copying the memory and updating all references to the object. If it survives a gen 1 garbage collection, it is again moved – it is copied to the gen 2 heap, and all references are updated again.

The other thing you need to understand is how finalizers get called. When the garbage collector encounters an object which needs to be finalized, it has to put it on a queue and leave it uncollected until the finalizer has run – but remember that the garbage collector can only do two things: collect or promote. This means that the object has to be promoted to the next generation, just to give it time for the finalizer to be run.

Let’s look at some numbers again. The following code has a simple finalizer which just adds to some counts: the total number of objects finalized, and the number which reached the later generation heaps.

void Main()
{
     Thing.Gen1Count = 0;
     Thing.Gen2Count = 0;
     Thing.FinaliseCount = 0;
     for (int repeatCycles = 0; repeatCycles < 1000000; repeatCycles++) {
            var n = new Thing();
     }
     GC.Collect();
     GC.WaitForPendingFinalizers();
     ("Total finalizers run:" + Thing.FinaliseCount).Dump();
     ("Objects which were finalized in gen1:" + Thing.Gen1Count).Dump();
     ("Objects which were finalized in gen2:" + Thing.Gen2Count).Dump();
}

public class Thing {
     public static int FinaliseCount;
     public static int Gen1Count;
     public static int Gen2Count;
     ~Thing() {
           finalize();
     }
     private void finalize() {
           FinaliseCount += 1;
            var gen = GC.GetGeneration( this);
            if (gen == 1) Gen1Count++;
            if (gen == 2) Gen2Count++;
     }
}

After running this a few times, it’s quite clear that the performance is all over the place. I got run-times ranging from 0.5 seconds up to 1.1 seconds. A typical output looks like this:

Total finalizers run: 999999
Objects which were finalized in gen1: 118362
Objects which were finalized in gen2: 881637

As you can see, most objects go through two promotions before they are collected, incurring a significant overhead.

With a few changes, we can significantly improve this situation.

void Main()
{
     Thing.Gen1Count = 0;
     Thing.Gen2Count = 0;
     Thing.FinaliseCount = 0;
     for (int repeatCycles = 0; repeatCycles < 1000000; repeatCycles++) {
           var n = new Thing();
           n.Dispose(); // This is new - we could also have used a using block
     }
     GC.Collect();
     GC.WaitForPendingFinalizers();
     ("Total finalizers run: " + Thing.FinaliseCount).Dump();
     ("Objects which were finalized in gen1: " + Thing.Gen1Count).Dump();
     ("Objects which were finalized in gen2: " + Thing.Gen2Count).Dump();
}

public class Thing : IDisposable {
     public static int FinaliseCount;
     public static int Gen1Count;
     public static int Gen2Count;
     public void Dispose() {
           finalise();
           GC.SuppressFinalize(this); // If we can perform finalization now, we can tell the GC not to bother
     }
     ~Thing() {
           finalise();
     }
     private void finalise() {
           FinaliseCount += 1;
           var gen = GC.GetGeneration(this);
           if (gen == 1) Gen1Count++;
           if (gen == 2) Gen2Count++;
     }
}

The changes I have made is to make Thing implement IDisposable, make the Dispose() method call GC.SuppressFinalize(this), and make the main loop call Dispose(). That tells the garbage collector that the object has already finished disposing of any resources it uses, and it can be collected immediately (instead of being promoted and placed on the finalizer queue).

The code now runs in a very consistent 0.2 seconds – less than half the original – and the output looks like this:

Total finalizers run: 1000000
Objects which were finalized in gen1: 0
Objects which were finalized in gen2: 0

As you can see, the finalizers now all run while the object is still in gen 0. Measuring using the Windows Performance Monitor tells a similar story: in the version which uses only the finalizer, the monitor records numerous promotions and an increase in both gen 1 and 2 heap sizes. We don’t see that happening when we use the Dispose() method to suppress the finalizer.

So there you have it. Finalizers are for guaranteeing your resources get released. IDisposable is for making your code faster and more predictable.

LINQ and time complexity and data structures, oh my!

LINQ is a wonderful thing. It significantly enhances the expressiveness of C#, and provides lots of other benefits as well. Unfortunately, it comes with a cost.

It can hide things from you.

One of the real dangers is out-of-control time complexity. How many times will the following code perform the .Distinct() operation?

var rnd = new Random();
var values = Enumerable.Range( 1, 1000).Select(r => rnd.Next(10));

var uniqueValues = values.Announce( "Performing Distinct() operation...").Distinct();
if (uniqueValues.Count() > 2) uniqueValues.First().Dump();

If you answered ‘two’, congratulations! My output looks like this:

Performing Distinct() operation...
Performing Distinct() operation...
3

Of course, ReSharper warns you about this: “Possible multiple enumeration of IEnumerable”. ReSharper won’t catch all sins, though. I ran into something similar to this recently:

// Example data
var rnd = new Random();
var values = Enumerable.Range(1, 10000).Select(r => rnd.Next(10)).ToList();
var otherData = Enumerable.Range( 1, 10000).Select(r => rnd.Next(20)).ToList();
var counter = new object();

// Problem code
var uniqueValues = values.CountCalls(counter).Distinct();
var otherDataWhichMatchesValues = otherData.Where(od => uniqueValues.Contains(od));
otherDataWhichMatchesValues.Count().Dump();
MyExtensions.GetCallCount(counter).Dump();

That took 19 seconds to run, and made 10,000 calls to Distinct()! Can you see what’s going on? The Where() operation is testing each entry in otherData against uniqueValues – enumerating uniqueValues once for every entry in otherData – and ReSharper 8 doesn’t warn you about it! If you’re used to seeing that warning whenever you try to enumerate an IEnumerable more than once, you might be tempted to think that it won’t happen. You would, of course, be wrong.

A Distinct() operation runs in O(n) time, Where() introduces an additional O(n), and .Contains depends on the underlying data structure – which in this case, is a List. So our overall operation is running in O(n^3) – that’s a real performance killer on any non-trivial data set.

Dropping a .ToList() after the .Distinct() reduces our run time to 3 thousandths of a second, and reduces our Distinct() operations to one. Much better! (Well, at least, it seems that way for now.)

I have a habit which ReSharper doesn’t like much. I usually avoid methods like Count(predicate), and prefer to chain a Count() on the end of a Where(predicate). One of the reasons I do this is that I think it makes it clearer which calls are LINQ queries, subject to multiple evaluations, and which calls will cause evaluations. Of course, that doesn’t help if you don’t spot that .Distinct() is in the LINQ namespace in the first place!

It’s easy to forget about things like time and space complexity, but there’s a reason you learned that stuff: It’s important! Whenever you write a loop or make a LINQ call, something in the back of your mind should be thinking about how nested the current operation is, and how much time and space complexity you’re layering on top. That’s not to say that you should necessarily optimise everything, but it may help you to spot problems before they get out of control.

There are two bigger-picture lessons to learn from this.

Algorithms and Data Structures

The real root of the problem, in this case, came from not thinking through the algorithm or selecting appropriate data structures. Rather than trying to decide on an outcome first, the programmer has worked through a series of operations against the available data, until it has ended up in the right shape. You can picture the thought process:
– I need to get the entries in otherData which match entries in the values list.
– That will be slow, so I’ll call distinct on values.

The fix – adding a call to ToList() after calling distinct – has unfortunately introduced a more subtle performance bug. It works well for the test data set, but it won’t perform as well if values is sparse: if Distinct() removes a lot of duplicates, then we’ll see a performance improvement, but if there are few duplicates, the original problem the programmer was trying to fix will remain. Let’s measure.

// Example data
var rnd = new Random();
var dataRange = 10;
var values = Enumerable.Range( 1, 10000).Select(r => rnd.Next(dataRange)).ToList();
var otherData = Enumerable.Range( 1, 10000).Select(r => rnd.Next(dataRange)).ToList();
var counter = new object();

// Operation
for ( int i = 1; i < 100; i++) {
     var uniqueValues = values.CountCalls(counter).Distinct().ToList();
     var otherDataWhichMatchesValues = otherData.Where(od => uniqueValues.Contains(od));
     otherDataWhichMatchesValues.Count().Dump();
}

We’ve now introduced a variable – dataRange – which will roughly control how many duplicates we’ll get. This code roughly parallels our original, with the ToList() fix (run through numerous iterations to exaggerate the timings). As is, it completes in 0.6s, but if we change dataRange to 1000, the run-time increases to 5.4s.

Consider the operations we want to do. We’re looking to build the ‘values’ dataset in the first place, and then we’re going to make many calls to .Contains() against it. While the time complexity of an insert operation on a list is O(1), Contains() is O(n). What we really want is a data structure which is O(1) for both insert and contains operations – and that’s a hash. So the fix we should really make is to change the values dataset to a HashSet, and drop the distinct operation altogether:

var rnd = new Random();
var dataRange = 10;
var values = new HashSet<int>(Enumerable.Range(1, 10000).Select(r => rnd.Next(dataRange)));
var otherData = Enumerable.Range(1, 10000).Select(r => rnd.Next(dataRange)).ToList();
var counter = new object();

for (int i = 1; i < 100; i++) {
     var otherDataWhichMatchesValues = otherData.Where(od => values.Contains(od));
     otherDataWhichMatchesValues.Count().Dump();
}

Now the run time is around 0.1s, regardless of the value of dataRange. As we can see, it’s not only faster in the sparse case, but it’s faster even than the ideal case with many duplicates – which we timed at 0.6s with 100 iterations.

I see a lot of developers who have a favourite data structure – usually a list or an array – and rarely or never make use of other data structures. If a piece of code has emerged as a target for optimisation, you may as well optimise it properly the first time around. Throwing a .ToList() onto the end of a LINQ expression is kind of a code smell: it indicates that you may not have selected the right data structure in the first place. Unless, of course, you’ve looked at the time complexities of the operations you need, and a list is a good fit: in which case, by all means ToList() it!

Over-reliance on tools

As a final thought, I want to caution people against over-reliance on tools. The problem is not that the developer relied on ReSharper to spot multiple enumerations; even if they’re used to spotting them without it, this is an unusual operation (which is why the ReSharper rule didn’t catch it – and yes, there is an open issue to address exactly this situation). The problem emerges when the developer not only starts to rely on ReSharper to pick up rules like this one, but starts to assume that code without ReSharper warnings is good code. That goes hand-in-hand with the fact that ReSharper isn’t always able to fix the warnings appropriately: in this case, even if ReSharper had caught the problem, its solution – to enumerate the IEnumerable to a list – wouldn’t have been the appropriate (or at least, the best) solution.

Shared session factories in NHibernate

NHibernate really is a fantastic ORM… unless you use it badly. Or unless you use it kinda OK. Or unless you use it almost-but-not-quite-perfectly. Then it can be a right pain in the neck. There are a lot of things you can get wrong, and those things can cause you a world of pain.

The particular pain I dealt with recently was memory pressure.

One of the most crippling things you can do to a server is to fill up its RAM. Once your RAM is full, your OS has to start copying things out of RAM onto disk, and then back into RAM when you need to use them again. Disks are slow – much, much slower than RAM – and this is going to hurt your performance badly.

Picture this. You’ve done all the right things. Your software is built using a loosely-coupled Service-Oriented Architecture. You have a website, and it hands all sorts of tasks off to a separate service layer. You have a second service handling various data import tasks. As your load increases, it’s going to be very easy to scale horizontally: you can move your services off to separate servers, and the only thing you need to do is update a few network addresses. Once you expand beyond what you can handle with four servers (those three functions plus a separate database server), you can load-balance each function individually.

You’ve also decided to handle multiple tenants with multiple databases. This one isn’t the right decision in every situation, but there are lots of times when it makes sense, particularly if you’re holding a lot of data for each client. It makes it trivial to archive individual clients off. It makes it easy to offer different tiers of backup. It stops your row-counts from getting too high, and it isolates small clients from the performance headaches of the massive data sets maintained for larger clients.

NHibernate is going to kick you in the teeth for doing this.

We’ve been watching the problem approach for a while now. The base memory overhead for each process soared past a gigabyte some time ago. As our client list headed towards a hundred, our memory overhead headed towards two gigabytes per process. I didn’t need to run a memory profiler to know where the problem was (although I did use one to confirm my suspicions). The culprit was the NHibernate session factories. A single session factory can run towards 20 MB. With fifty clients, that means you have a full gigabyte of RAM filled with nothing but session factories. I didn’t want to have to start scaling horizontally early just because of this, and after all, this gigabyte of memory consisted of lots and lots of copies of 20 MB structures which were identical except for a single string: the database connection string. That’s horribly wasteful. (Actually, there were other differences, but we’ll get to those.) I also couldn’t start disposing of session factories once they hadn’t been used for a little while: these things take a while to construct, and we can’t let our users sit around for several seconds when they log in for the first time in a while. I needed to start re-using our session factories.

There are at least two approaches you can take here. The one I chose has two caveats: firstly, that you’re using NHibernate.Cfg.Environment.ReleaseConnections = “on_close”, and secondly that you’re not using stateless sessions at all. We’ve been moving towards ditching stateless sessions for some time anyway, because stateless sessions don’t support listeners, so the second requirement wasn’t a problem for us. The first setting is a bit more troubling, because it’s legacy behaviour: rather than letting NHibernate manage connections using one of its newer strategies, it forces NHibernate to provide a connection when a session is first opened, and use that connection for the duration of the session. This was acceptable because we were already using the legacy setting, for reasons undocumented in either code comments or our source control history. I haven’t looked into the costs and benefits of this legacy mode compared to the other strategies.

So, let’s dive into some code. First of all, you’re going to need to set your connection provider:

       cfg.SetProperty(Environment.ConnectionProvider,
                       typeof(SharedCompanyConnectionProvider).AssemblyQualifiedName);

Then, seeing as there’s no such thing as a SharedCompanyConnectionProvider, you’ll need to implement it!

        public class SharedCompanyConnectionProvider : DriverConnectionProvider
        {
            protected override string ConnectionString
            {
                get { return NHibernateSessionManager.Instance.DatabaseSettings.GetCurrentDatabaseConnectionString(); }
            }
        }

If that looks a bit scary, good. If not, let me explain. Your connection provider is no longer thread-safe! It’s relying on a singleton which serves up a connection string. This is dangerous code, and you need to be careful how you use it. (Don’t even think of using this without putting some tests around it – see later in this post.)

Now, on to wherever it is you build your sessions. Mine looks something like this:

            private static readonly object CompanySessionFactoryLockObject = new object();
            ...
            lock (CompanySessionFactoryLockObject)
            {
                var sessionFactory = NHibernateSessionManager.Instance.GetSessionFactory();
                NHibernateSessionManager.Instance.DatabaseSettings.SetCurrentDatabaseConnectionString(databaseGUID);
                ISession session = sessionFactory.OpenSession();
            }

I’ve removed a lot of the detail, but that should give you the gist of what’s going on. The key component here is the lock() line. Now that our connection provider isn’t thread-safe, we have to ensure no other threads interrupt between setting the connection string on the singleton, and creating the actual session (at which time the connection provider will provide a session with the current connection string).

The final step in the process is to make sure you have some thorough testing around what you’re doing. The risk of getting it wrong is that your session factory hands you a connection to the wrong database, and that could be very bad. I’m not going to run through the entire test setup, but it’s certainly not a unit test – this thing runs in a test suite which uses a real database instance and creates (in the case of this test) five complete databases which we’ll be accessing from various threads.

        private volatile static string _assertFailed;
        private const int NumThreadsPerDb = 2;

        [Test]
        public void HammerMultipleDatabasesSimultaneously_BehavesWell()
        {
            List<Thread> runningThreads = new List<Thread>();
            foreach (var coGuid in companyGuids)
            {
                for (int i = 0; i < NumThreadsPerDb; i++)
                {
                    var thread = new Thread(StartHammering);
                    runningThreads.Add(thread);
                    thread.Start(coGuid);
                }
            }
            while (runningThreads.Any(thread => thread.IsAlive))
            {
                if (_assertFailed != null)
                    runningThreads.ForEach(thread => thread.Abort());
                else
                    Thread.Sleep(1000);
            }
            if (_assertFailed != null) Assert.Fail(_assertFailed);
        }

        public void StartHammering( object companyGUIDObj)
        {
            // nb don't assert on a thread. We're set up to set a message into _assertFailed instead.
            var CompanyGUID = (Guid)companyGUIDObj;
            string expectedDbName = CoDatabaseNames[companyGuids.IndexOf(CompanyGUID)];
            try
            {
                Entity entity;
                using (var session = NHibernateSessionManager.Instance.GetNewSession(CompanyGUID))
                {
                    // Set up the entity with some unique data
                    session.Save(entity);
                }
                for (int i = 0; i < NumTests; i++)
                {
                    using (var session = NHibernateSessionManager.Instance.GetNewSession(CompanyGUID))
                    {
                        if (!session.Connection.ConnectionString.Contains(expectedDbName))
                            throw new Exception( "Got a connection for the wrong database!");
                        var ent = session.Get<Entity>(entity.GUID);
                        // Check some unique thing about the entity. Change it to something else for the next iteration.
                    }
                }
            }
            catch (ThreadAbortException) { }
            catch (Exception ex)
            {
                if (!ex.ToString().Contains( "ThreadAbortException"))
                    _assertFailed = ex.ToString();
            }
        }

There’s a lot going on there. The key theme is that we’re creating a bunch of threads, and each thread is assigned to a particular database. New sessions are continuously created, and then queried to ensure they contain the expected object. If the object is not found, or the session has the wrong connection string, then something has gone wrong, and the whole system isn’t behaving in a thread-safe fashion.

Note that in a multi-threaded test situation, you cannot just throw an exception if something goes wrong – you need to pass information about the failure to your primary thread.

One final (and important) step is to ensure the test does fail appropriately if the system doesn’t behave as expected. Remove the lock statement around your session creation code and run the test; you should see it fail. Adding the lock back in should fix it.

In Defense of Open-Plan Offices

I recently ran across an article on Quartz about why open-plan offices are bad, and I felt I had to respond.

We went open-plan about a year and a half ago, and our results have been overwhelmingly positive. However, the Quartz article cites some real studies which back up their position; we must be doing something right which other open-plan offices haven’t worked out yet.

The first thing we did right we did by accident. Our original office layout called for medium-to-small corner desks for everyone. Thanks to an ordering snafu, what showed up were largish corner desks. This mistake cost us about six work-stations, but the benefits have been fantastic. We do have our minimalists, who hide their PC under the desk and refuse to allow anything beyond a keyboard, mouse, screen, and phone to take up even short-term residence on their desk, but we also have our clutter-bugs (I’m one of those) who end up with a million things strewn about. I have enough space to keep my clutter without impacting my neighbour, which is great for both our stress levels. There is no bumping of chairs. I can squeeze three people into my corner without annoying my neighbour. Space reduces inter-personal friction, and our desk purchasing snafu saved us from too little space and too much friction.

Those of you who are used to cubicles will be wondering what I mean by “impacting my neighbour” – surely we at least have a cubicle to ourselves? Nope. Our office layout looks something like this:

Office Layout
Office Layout

That’s sixteen desks, arranged in groups of four. This runs counter to the traditional wisdom that your cubicles should be set up with an “officey” feel. In fact, our layout is unusual enough that I had to draw it myself – Google Image Search turned up lots of layout diagrams for private cubicles, and a few with a twin- or triple-share design, but nothing which put four desks together like this. The thick lines are partitions between cubicles: they are a little over eye-height, but the top third is glass, so you can see your colleagues in other cubicles without having to stand up. The top and left sides of the diagram are the sides of the building, with plenty of big windows and sun-shades which can be pulled down. Below the bottom of this diagram are more desks, and to the right is an open space (further right again is the kitchen). We generally put teams together: my team has the bottom-right cubicle in the diagram.

Before you object that it may well be fine for my industry, but some people need periods of uninterrupted focus to get things done, let me tell you what we do: we’re an engineering firm. We’re not all software engineers (we have environmental engineers, chemical engineers, and other sorts), but my team is all software. We found, actually, that it was the communicators who did less well in the open-plan environment, and quickly migrated to one of the few private offices we kept. If you spend a lot of time on the phone, background noise is annoying. It’s annoying for me, too, but there’s an easy solution open to me: headphones. Some of us don’t seem bothered by general office noise, but the rest of us use varying degrees of noise-cancellation, from simple earbuds to high-end Bose active-noise-cancelling headphones. I used big over-the-ear ‘phones with lots of foam for a while, but I recently switched to Sony active-noise-cancelling on-ear ‘phones (which cost about sixty bucks), and now I can’t even hear my desk phone when it rings. My usual choice of music is Enya: it’s calming and has no distracting lyrics.

So far, I’ve just talked about why our open-plan office isn’t bad, which ignores one very big thing which surprised a lot of us. Our open-plan office is good. There was plenty of opposition to the change; I myself was deeply skeptical, and spent some time wondering if “Will I get my own office?” is an appropriate question to ask when you’re interviewing for a new position. I expected open-plan to be something we could put up with, but our experience has been much better than that.

Perhaps the best thing, for me, has been the lack of interruptions. In an office, people knock on your door, or they page your phone, or they come in and just start talking: people become interruptions on their terms. In an open-plan office like ours, the headphones-on signal has become a pretty strong indicator of “I’m busy”, and people are much more likely to come back another time, or flick you an email instead. A closed door was always a negative message: “I don’t have time for my colleagues”. We used it reluctantly, and imagined people rolling their eyes as they walked past. Headphones have become a very positive signal: “I’m getting stuff done”. In fact, if I spend too long without my headphones on I start to wonder if people might think I’m slacking off.

Another great benefit has been the general office atmosphere. There are a lot more “good-mornings” as we all get in to work. There are the occasional office pow-wows, particularly on Friday afternoons when the mood is light, we’re winding down for the week, and we sit around with beer or champagne in hand chatting about our week. Team meetings have often been praised for having the advantage of putting a bunch of smart, creative people in one room – ideas will happen. We put a bunch of smart, creative people in one room, with Google and all their usual developent tools in front of them and a beverage in hand, and we do it when the mood hits. It’s infrequent; don’t imagine for a moment that we while away our days, chatting and drinking. It’s not even weekly: there are times when most of us are busy, and it gets to 6 pm on a Friday and I notice that nobody brought ’round beers, and half the office has quietly slipped away as they finished their day.

Another benefit is that the open-plan layout is much more egalitarian, and much more accepting. New staff feel like they’re part of the team much more quickly. Status-conveying corner offices are gone. It’s much easier to change desks on a whim, because you won’t suffer from people going to the wrong office until they’re used to the change. Lack of an appropriate office is no longer a barrier to promotion: we used to have a small number of cubicles in the centre of the office. You were seen as a junior until a proper office opened up and you got to leave the cubes. If the offices were all full, we couldn’t hire an experienced candidate and put them in with the juniors. Going open-plan has made some problems we used to have vanish.

I said before that the best thing has been the lack of interruptions, but it hasn’t, really. It’s been the inter-personal dialog. Not counting weekends, I spend more waking hours at work than at home, and I’m a big believer in feeling just as at-home at work as you do at home (exception: the dress code!) Switching to an open-plan layout has transformed the office from a little room where I go to work, into a friendly, social atmosphere where I can shut out the world and just get things done, or listen to my friends and colleagues while I perform less-concentration-intensive tasks.

I understand the concerns of the anti-open-planners. They feel like they’ll lose their privacy. They feel like they’ll be constantly interrupted. Worst of all, they feel like they’ve been commoditised – they used to have their own office, now they’re just another plot in a cube-farm. This feeling is partly the fault of pop-culture: it has always idolised big, well-appointed, private offices, and the cube-farm has been mocked in Dilbert so often we’ve come to believe the stereotype. The simple fact is, however, that moving from private offices to an open-plan layout hasn’t commoditised us in the least; if anything, it’s made each of us see our colleagues more as individuals, and less as office doors.

There are certainly things which can go wrong in open-plan offices, but I think, in general, that they’re not short-comings of the open-plan environment; they’re short-comings of the organisation, which closed doors used to hide. People shouting across the room or taking things from your desk is not an open-plan problem; it’s a respect problem. Disagreement on the A/C temperature? Did you honestly have individual thermostats in your offices? I never have. In an open-plan office there will probably be hot-spots and cold-spots – if you don’t like your micro-climate, see if you can swap with someone who has a location that’s more comfortable for you. I love sitting directly under an A/C outlet to stay as chilly as possible.

One thing which has been important to making our open-plan transition a success has been meeting space. We have several small rooms and one large boardroom for breaking out for meetings. One-on-one meetings with managers can still happen in private. Larger meetings or teleconferences can be held without adding to the general office noise. The boardroom has a conference phone and a computer with a huge screen and wireless keyboard and mouse. The smaller meeting rooms have desk phones and power-points for laptops. We have an office wireless network which lets people find a quiet corner with a laptop if they need to.

Ultimately, I’m not trying to say that open-plan is better than private offices. They each have their advantages. My aim is to dispel the myth that you should never put knowledge workers in an open-plan office. The Dilbert stereotype is a false one. I imagine lots of people have had negative experiences – but I’m convinced that most of those negative experiences have been the result of open-plan-done-wrong.

Here’s my recipe for open-plan success:

  • Nice big desks with plenty of power and data points
  • Freedom for people to desk-swap
  • Multi-person cubicles with low partitions between them and glass tops so you can see each other without having to stand up
  • Enough small and large meeting rooms, with computers and conference phones
  • Decent-quality active-noise-cancelling headphones provided
  • A general rule that headphones-on means “I’m trying to focus and would prefer not to be interrupted”
  • Keep a handful of small, no-better-than-the-open-plan-desks offices for those who just can’t live in the open-plan environment

If you follow those steps, you are on the right track to an open-plan layout which will be good for productivity and make your employees generally happier.

Software Collaboration using Jabber

Software teams are often not in control of their own budgets. It’s not uncommon to find teams who regularly fork out thousands per developer for Visual Studio licences, yet have trouble getting budget for a $25/year DynDNS account. I run into this sort of problem myself from time to time, and so it’s nice to have free options for key team needs.

One key requirement for a software team is to be able to communicate. Too many distributed teams rely on email and ad-hoc IM solutions, and with my current team becoming increasingly distributed, I wanted a good way for us to keep in touch. Campfire by 37 Signals is the current industry go-to option for this sort of thing, but for teams with tightly controlled budgets, or self-funded open-source projects, the $144/year entry-level plan might not be an option.

Enter XMPP.

If you don’t know what XMPP is, let me explain. It’s basically IRC with federated user account management. Is that clear? I’m going to call XMPP by its old name, Jabber, from here on in, because I think it’s easier.

Think persistent chat rooms, private messaging, moderator privileges, all of that sort of thing – but you can join any room in the world using your gmail account, or your corporate email address (if you run a jabber server), or your own personal email address (if you have your own domain). The interface isn’t quite as schmick as Campfire, but there are open-source clients and libraries, so the sky’s the limit, really.

We already did a lot of our communication using GTalk, and Google Talk works over jabber – so this seemed like an ideal option.

The first thing I needed was a private XMPP server. It seems like there are two good free, open-source options out there: Openfire and ejabberd. I chose ejabberd, because the install seemed simpler – and let’s face it, there’s no point blowing hours of developer time to save $144/year. We used a DigitalOcean VPS to host it (if there’s one thing every software team needs, it’s a linux box with a public IP address). The trickiest part was realising that I needed two different host names – an authentication server and a conference server. They both run on the same virtual server, of course, but I needed a wildcard DNS entry (if there’s a second thing every software team needs, it’s a DynDNS account). My authentication server runs on (let’s pretend our company acronym is ABC) abcim.dyndns.org, and the conference server is conference.abcim.dyndns.org – and thanks to a wildcard entry, I’m only using up one of our hostname entries.

Installing and configuring ejabberd is left as an exercise for the user – but the documentation is reasonably easy to follow, and the installation is quite straight-forward. (Pro-tip: you can add any jabber-enabled account as a server administrator. I added my gmail account.)

Once you have a Jabber server up and running, you’re going to need a client. Pidgin is the hands-down winner when it comes to administering jabber group rooms (or Adium for Mac, as I understand). I signed in with my gmail account and joined a group room called Software on conference.abcim.dyndns.org, having already set my gmail account as an admin account in the ejabberd config. Now I can invite the rest of the team to the room. While Pidgin is definitely where it’s at for setting up a room, there are other clients which can cope with jabber group rooms – Jitsi seems decent, and Xabber make a sub-par but bearable Android client (the other jabber/XMPP-enabled Android apps I’ve tried have varied from non-functional to sub-par-and-not-bearable). For any team members without gmail accounts, there are other jabber account providers out there, or you can set up your own accounts @abcim.dyndns.org using /sbin/ejabberdctl.

At this stage, we had a group collaboration server which allowed us to sign in either using our existing gmail accounts or private @abcim.dyndns.org accounts. I wasn’t prepared to stop there, however.

One team member wasn’t keen on installing a native client. I can understand that – as developers, we push our machines to the limit, and we’re naturally suspicious of installing software on our machines. Down with bloat!

Luckily, there are web clients out there. I made an account for him @abcim.dyndns.org and went hunting for a web-based client. I briefly reviewed several, but the simplest install that seemed like it wasn’t a dead end was Candy, which I hosted under Apache. I chose Apache instead of the other options because I’m super-familiar with Apache hosting, and it came pre-configured for Apache anyway. It took a little tweaking to get it hosted in the folder I wanted, but it wasn’t long before I had a web client running at abcim.dyndns.org which you could point a browser to, sign in with your @abcim.dyndns.org account, and get auto-joined to our Software group room.

Even this wasn’t enough. I didn’t just want a place where we could sit and chat. I wanted a dynamic room which reflected not only what we were saying, but what we were doing – and for that, I needed to write some software.

We had recently signed up for a paid GitHub account, and migrated our old-school SVN repo into a sparkly new GitHub private repo. The benefits of that are a topic for another blog entry, but I promise you it’s been more than worth the seven bucks a month. GitHub have a great API, and part of it allows you to register for http/JSON notifications of various events, so I decided we should have notifications of various GitHub events posted to our jabber server. If you go into your GitHub repo settings and click on Service Hooks, you’ll see there’s a pre-built jabber hook – but it only accepts a username. I didn’t even try it out, but I assume it just sends event notifications to a jabber user – and I wanted it to go to our group room instead. Luckily, GitHub provide a generic WebHook option which allows you to receive JSON event notifications at your own web service.

Enter GitHub-XMPP. Well, actually, that didn’t exist a few days ago – but I realised I needed something which could accept GitHub event notifications and post them to our jabber room, so I wrote one. It was quite a fun little project, actually. On the way through, I discovered that the GitHub event JSON was woefully undocumented – and I also discovered json2csharp and json formatter and validator, which were both invaluable when exploring the json API and building a C# json deserializer. I’d also been meaning to take a look at Nancy, and this was the ideal opportunity.

The long and short of it is I now have an application (which I’m running under mono on our linux server) which can receive GitHub events and give our jabber group room a running commentary on what’s going on with our GitHub private repo. Push events, wiki updates, pull requests, issues – you name it, whatever’s going on, our faithful jabber GitBot tells our Software room what’s going on. I won’t cover the installation here – there’s a quickstart page on the project wiki.

I have plenty of plans for GitHub-XMPP – I want to turn it into a fully-featured, configurable, scriptable jabber bot, capable of being both a useful tool and a creative outlet for our team. For the time being, though, it’s functional – it serves the purpose of turning our fledgling jabber server from a place which shares what we’re saying into a place which also shares what we’re doing. GitHub-XMPP currently has somewhere between 15 and 20 hours in it – which means it’s already cost a great deal more than the $144/year we’re saving by not subscribing to Campfire. I built it on my own time, though, and not only did I have fun, I released it under the GPLv3 – so hopefully that $144 will be saved over and over, by teams with tight budget controls and open source projects.

I would encourage you to think about Campfire though – I haven’t used it myself, but I’ve heard so many great things about it I can’t wait to try it out. GitHub are using it for their internal team communication, and there aren’t many companies out there with a better finger on the pulse of modern software development than GitHub.

Please tell me about your own collaboration solutions, or if I helped you – I’d love to know I’m helping to keep other teams in touch and motivated!

Creating Custom Windows Time Zones

I spend a lot of time dealing with time zones. I generally use one of two approaches:

  1. Store an offset
  2. Store a Windows Time Zone identifier

Method one is quick and straight-forward, but not very flexible: it doesn’t account for daylight savings, for a start. Method two is more complex, but generally more powerful – .Net gives you some great tools for working with time zones, and has some basic safety checks which (sometimes) prevent you from doing silly things like doubling up your time zone conversions.

 I recently dealt with an interesting issue: a non-standard time zone.

 We retrieve data from lots of remote monitoring equipment. We talk to a variety of different types of gear, most of which are designed to be simple and reliable, and operate using as little power as possible. In practice, this means the equipment uses a simple clock, with no time zone awareness. Industry practice (which I’m often stuck with) is to configure the equipment with the local time (which violates rule 1: Do everything in UTC.)

 This particular piece of equipment had been deployed during the summer, while daylight saving was in effect, and the clock had been set to the local daylight time: UTC+10.5 (yes, there are time zones on the half hour!). When we configured the data connector at the server end, we spotted a handy time zone with the right offset (Adelaide) and set it to use that. Of course, nobody thought about daylight saving, and so when it ended our Adelaide time zone suddenly reverted to UTC+9.5, and so all of the incoming data started being treated as if it were UTC+9.5 instead of UTC+10.5 – which was a problem.

 Of course, we’ve dealt with this sort of problem before: most of our equipment has no daylight saving support, but lots of our customers are in places which follow daylight saving. We normally just find a time zone with the correct offset but no DST: for example, gear in Sydney gets set to either +10 (and we use the Brisbane time zone) or +11 (and we use somewhere like Port Vila, which is +11 year-round). However, when we went to find somewhere which was on UTC+10.5 year round, we ran into a problem. There is no such place.

 We didn’t really want to make code changes to support this situation, but we had to do something, and it didn’t take much digging to discover that .Net pulls its time zone information from the Windows registry. After a quick look at the relevant keys, however, it quickly became obvious that it wasn’t going to be simple to craft a custom entry. The Adelaide key looks like this:

Regedit showing cryptic binary field for time zone configuration

The first six entries are fine. The three values with names starting with ‘MUI’ are just localisation references, and because we’re just doing this on one of our servers, we don’t care about that – and it turns out that you can ignore the whole @dll syntax and just put a string in here. Great!

 That TZI value looks nasty – and it turns out that it is. Fortunately, Microsoft provides an editor to modify these entries. Unfortunately, it was buried so thoroughly that we didn’t find it at the time (nor did StackOverflow!) and so we pushed on with our research. We found our answers in the MSDN TIME_ZONE_INFORMATION structure page. Despite the fact that the whole point of the registry is to provide convenient key/value pairs, Microsoft decided to go with storing a C struct in hex values. The TZI field is a hex dump of the _REG_TZI_FORMAT struct:

typedef struct _REG_TZI_FORMAT
{
    LONG Bias;
    LONG StandardBias;
    LONG DaylightBias;
    SYSTEMTIME StandardDate;
    SYSTEMTIME DaylightDate;
} REG_TZI_FORMAT;

Two things immediately stood out. c6 fd ff ff was not +9.5, nor was it +570 (if it were stored in minutes). It turns out the registry stores the negative of the time zone offset in minutes – if you fire up your programmer’s calculator, you’ll see that 0xfffffdc6 is, in fact, -570. This isn’t the only quirk – the date fields have some pretty specific requirements, the DaylightBias is in fact the difference between the daylight total offset and the Bias field, and the StandardBias is an optional offset from the Bias field to the standard (non-DST) time, which is generally (always?) set to 0.

Because I don’t like to half-do things (and I spotted a chance to brush off my rather dusty C++ skills) I built a tool to accept the required fields and churn out a file ready to be imported directly into the registry (keep in mind at this stage we hadn’t found TZEdit). It’s unlikely to ever be polished, but it’s available on GitHub.

I don’t really know what to make of this experience. I could have saved myself some work by continuing to search for a ready-made time zone editor, but the clock was ticking to get this fixed, and I found enough information to solve the immediate problem in much less time than it took me to find the editor (I didn’t build the registry file generator until after we had hand-crafted the registry entries we needed).

I think the real lesson to take out of this is that storing binary data like the TZI field destroys the usability of the registry. If Microsoft had used a few sensibly-named fields instead of dumping cryptically-formatted binary data into a single field, we could have solved the problem with much less effort.

Keep this in mind when building your own configuration systems!

Intuitive Interfaces

AgileKanbanLean Startup. We have more and more ways to think about the software development process, and our industry is getting better and better at what we do. We’re wasting less time, and delivering a more customer-focused product. New businesses have great tools to build great teams. We know how to manage source code, and some people have even worked out how to manage changing data models! Issue tracking is, by and large, a solved problem, and we have tools to keep our software fully tested and behaving the way it should.

What I don’t see enough focus on these days is the code we write. Sure, we’re all doing peer reviews (right?) and we think we manage code quality through unit testing. What worries me is that, usually, this just ensures that our code works. It does nothing to really manage quality. What’s wrong with this code?

class CreditCard
{
  private Decimal _Limit;
  //...
  public bool RaiseLimit(Decimal amount)
  {
    _Limit += amount;
    //....
  }
  //...
}

This is a blatant example, but it’s a problem I see all the time. The code looks clear enough, and it’s going to work. It will build, pass your unit tests, and there’s every chance it will pass peer review – but it shouldn’t. The problem is, there is a risk that someone will do this:

Decimal oldLimit = card.Limit;
Decimal newLimit = ...;    // Code to calculate the new limit
card.RaiseLimit(newLimit);

… and that would be awful! The caller has gone to the effort of calculating the new limit and passing it to RaiseLimit(), but that’s not what RaiseLimit() expects: it expects to be passed the amount the limit should be increased by. How do we solve this? There is a simple change we can make to reduce the risk this will happen. Our improved code might look like this:

public bool RaiseLimit(Decimal increaseAmount)
{
  _Limit += increaseAmount;
  //....
}

Functionally, nothing has changed – but we’ve provided an extra clue to a developer calling our function. In these days of intelligent IDEs, coders usually get to see the method signature that they’re calling. We should endeavour to take every possible advantage of IDE features like these to streamline code creation and reduce the chance of errors.

This is one simple example from a whole spectrum of things you can do to optimise code-writing. I have wasted countless hours reading through classes trying to work out how to use them. Here’s a recent example I ran into:

class ConfiguredTask
{
  public string ConfigFolder;
  public string TaskExe;
  public void WriteMainConfig(DateRange dates, string configName) {...}
  public void WriteSecondaryConfig(DateRange dates, string configName) {...}
  public void RunTask(string path, DateRange dates) {...}
  public ResultSet GetResults(DateRange dates) {...}
  //....
}

To use this effectively, you need to know quite a bit about the internal class. For a start, you need to know that you need to write the main and secondary config files before calling RunTask() – not as obvious as you might think, as there are dozens of other public methods and properties on this class. Second, you need to know that the two functions to write config files are expecting filenames with full path information, and they need to be different, but in the same folder. Third, you need to know that RunTask() persists the results of the task both into the database – something I didn’t want – and leaves output in the folder referenced by ConfigFolder. TaskExe must contain the name of the file to run, but must not contain any path – the executable must be in the path referenced by ConfigFolder. That wasn’t even the end of it! The code to run a task used to look like this:

task.WriteMainConfig(selectedDates, @"c:\main.cfg");
task.WriteSecondaryConfig(selectedDates, @"c:\second.cfg");
task.RunTask(task.ConfigFolder, selectedDates);
ResultSet results = task.GetResults(selectedDates);

Keep in mind that, if you didn’t have an example to hand, you had to read a fair portion of the class itself to make sure you had everything right! After I was finished with it, the class looked like this:

class ConfiguredTask
{
  public string PathToExecutable;
  public string ResultsFolder;
  private void WriteMainConfig(DateRange dates) {...}
  private void WriteSecondaryConfig(DateRange dates) {...}
  public ResultSet RunTask(DateRange dates,
    bool persistResults=true, string WorkingFolder=null) {...}
  public ResultSet GetResults(DateRange dates) {...}
  //...
}

Just through a little re-factoring and a handful of code tweaks, the caller now knows to give a full path to the executable (which can now be anywhere), they don’t need to know about writing config files (it’s done during RunTask()), they know that results are persisted by default but they can override that behaviour, they know they can provide an optional working folder, and they don’t have to call GetResults() after every call to RunTask() when they want the results of that task.

I’ve taken a class which needed quite a bit of reading to work out how to use it, and turned it into a class you can use with no more information than what IntelliSense shows you as you code. The code to run a task and get the results now looks like this, instead:

ResultSet results = task.RunTask(selectedDates);

I hope you can see why this would save a future developer time! We should strive for all of our classes to be like this. The idea of ‘self-documenting code’ doesn’t quite capture this: the whole point is to not have to read the code at all. I prefer the term ‘Intuitive Interface’. As a developer, you should aim for all of your classes to have this kind of intuitive interface. Think of the questions a caller might need answered, and then answer them – in your method signature if possible, and if not, in a method comment (ideally, in a fashion that leverages the IDE you’re working in):

/// <summary>
/// If the instrument is not running, set the passed-in mode and start it running.
/// If the instrument is already running it will *NOT* change the mode.
/// </summary>
private void EnsureInstrumentIsRunning(InstrumentMode mode)
{
  // ...
}

Even in that example (which I took from a current project), the first line is really extraneous – it’s repeating information that’s already available in the method signature.

Monitors and Coffee

There are all sorts of things to get right and wrong when you’re building software. There’s a lot to think about when you’re kitting up your team. What OS should desktops run? What tools will we use? How much should we spend on office chairs? One of the most important decisions you’ll make is what sort of monitor your developers get, and how many.

I’m not joking about. Your developers are going to be spending a fairly large fraction of their professional lives looking at their monitors. You really want to make it a pleasant thing to do. The standard I tend to see these days is to give developers a pair of 22″ wide-screen monitors. That might sound excessive, and other staff might be resentful, but it’s really not that costly these days. A good 22″ wide-screen might cost $500, as compared to a cheap small screen which costs, say, $350. Going for twin 22″ screens means spending $1000 instead of $350. Given that a good-quality monitor will often last upwards of three years, this means you’re spending a little over $200 a year per developer.

So what do you get out of it? Plenty. First of all, your developers will quite likely be more productive. I find debugging code to be much easier when I can have an application running full-screen on one monitor while my debugging tools are on the other. Tuning CSS is easier when I can look at the context in a browser while I’m editing the file. Visual Studio is much easier to use when you can devote an entire screen to code or UI layout while still having all of your explorers, editors, and toolboxes visible on the other. Widescreens make comparing versions of code files much simpler.

At one previous job, one of my screens had a rotating stand, and I ended up turning it on its side and using it mostly for editing code. I could see a lot more context when I had the entire ‘width’ of a 22″ widescreen in height. This tended to play havoc with remote desktop sessions, however. Most annoying was that whenever my manager connected back to his machine from mine, it re-shuffled his entire desktop. Use with caution!

This advice doesn’t just apply to screens. If your developer machines are slow, they’ll end up waiting for things they do frequently. When they have to wait a lot, they’ll find things to do while they wait. You’d be amazed at how much productivity you lose if your developers alt-tab to facebook every time they hit the build button! Fast computers for your developers are vital. Consider getting machines with RAID controllers, and mirroring their drives: developers tend to spend more time customizing their workstation than any other sort of employee, and having to re-build their machine results in more lost time than just any un-committed code changes. A software developer is likely to have all sorts of settings on their IDE; they’ll have lots of little productivity tools installed, like their favourite code editor, and perhaps a little tool to help them build regexs; macros and little time-saving scripts and other things. It will cost them time, and worst of all, it will be frustrating, to have to get everything just right again. Having mirrored drives helps to reduce the chance this will happen. Your other alternative is to have a good backup and recovery system for each developer workstation, but just having RAIDed drives is probably cheaper and easier to manage.

There are lots of little things to do to make your programmers happier where they work. Remember, they spend 40-odd hours a week in front of their PC. The more pleasant it is, the happier they’ll be, and you’ll tend to get more out of them. If the chairs are uncomfortable, they’ll dread coming in to those 8-hour days, and you’ll spend more on sick leave. If the coffee in the office is terrible, your whole programming team will start leaving together two or three times a day for ‘a quick coffee up the road’. Multiply your charge-out rate by the number of programmers in your team by 1/4 (for 15 minutes) by three, and that’s what it costs you every day to not have good coffee in the office.

I’ve seen a lot of workplaces who skimp on the little luxuries because they’re seen as ‘too costly’. You know what’s really too costly? When your programmers see photos of the office of one of their former colleagues, and the double-22″ monitors make your single 19″ look like a cage. When the espresso machine in the common room trumps the tin of International Roast you keep under the sink. When the former colleague gets a huge glass whiteboard, while you give your developers butcher paper and pencils. You know what will happen? You’ll have employees who treat their job as something to get them by until they find something better. They will join, they will hate it, and fairly soon, they will move on.

It’s simply too expensive to spend $350 per machine on monitors when you should be spending $1000.

Software Development 101

Indulge me while I take a paragraph or two to get to the point.

I’ve worked as a Software Engineer at a lot of places now. I’ve stayed in some for many years, others only months; if there’s not anything interesting left for me to achieve, I usually start to feel like moving on. I’d like to think this is best for everyone: I get to take my ideas and experience and give a different company the benefit of them, and my (now ex-)employer gets to bring in a new employee with a different set of ideas and experience. This cross-pollination of ideas is a part of the life of software development, but there is a balance. Keep your pool of developers too static, and you risk stagnating. Turn them over too quickly, and you’ll be spending too much money on recruitment and training, and not keeping people around for long enough to see a return on that investment. Get big enough, and you can get these benefits just by letting people transfer within your company: the best of both worlds. That’s not what I want to talk about, though.

I want to talk about the baseline ideas and experiences I bring to a new company. The stuff that I just want to get out of the way so I can start actually innovating.


Now, not everyone needs these ideas. In fact, my ideal employer doesn’t need any of these ones. Every time I start somewhere new, these are the things that I hope are already covered, so I can get on with solving interesting problems. These are also some of the things I ask about when I’m interviewing a new employer; they’re problems that I’ve solved over and over, and I know the consequences of not solving them. If I sound a prospective employer out in and interview, I find out that they haven’t solved these problems, and it sounds like there won’t be any support for me solving them when I start, I’ll seriously think about turning down the offer. Some of them are optional; some of them are even inappropriate for some teams. Some of them are so basic that you can’t expect to build decent software at all without them. Every one of them I’ve had to introduce to an established team at some stage or other.

Source Control

This is the number one must-have for software development. When I’m working on a small, personal project which nobody else will ever touch I still use source control. It’s just too simple to set up, and the benefits are too enormous.

One of the first things I learned about source control was that Microsoft Visual Source Safe is not source control. At least, not at the sort of level you want even for a small, personal project, and certainly not at the level you need to build commercial software. I’ve suffered through it with several employers now, and there are just too many things I keep wanting to do that it doesn’t let me. Git seems to be the current darling of the open-source world; it’s free, it’s quick to get started with, and tasks like branching and merging are said to be quick and straight-forward. I haven’t worked with Git yet: I always seem to end up using the well-known staple, Subversion (and earlier in my career, it’s spiritual predecessor, CVS). UPDATE: Git is fantastic, and switching to it has had significant productivity benefits for my team.

If you haven’t solved your source control problem yet (or if you think you have, but you’re using VSS), Subversion is a good place to start. It provides all of the basics, it’s easy to set up, and it does most of the things you’re going to need to do for a small- to medium-sized team. There are plenty of hosted solutions available (with all sorts of optional add-ons), but I really think something this important is worth hosting in-house. It’s easy to do, you’re in control, and you won’t go dark every time your ISP hiccups. Do back your repository up. If you go with a hosted solution, make sure you have some way of backing up the repository yourself in case the worst happens. UPDATE: Using a distributed version control system like Git with an externally-hosted master somewhere like BitBucket or GitHub gives you all the advantages of both worlds.

What are the benefits? Well, for starters, you can see your entire development history. If you introduce a bug but can’t quite remember exactly what you changed, you can see the entire history of a single file, a whole folder, or your entire project. If someone else introduces a bug, you know who to go to to discuss a possible fix. If you really mess things up, you can roll back to a previous version. You can merge in code from multiple developers as you go, even within a single file, and you can branch the code off while you work on significant changes, and then merge it back into the trunk later. If you don’t have a decent source control solution in place, you have all the makings of multiple on-going headaches; if you do, you have a powerful tool on hand.

There’s one thing I’d like to add here, and not everyone will agree with me. I believe in small, frequent, buildable commits. I don’t like seeing a commit which touches 35 source files, has two pages of comments (which still finish with “lots of other little fixes”), and resolves a dozen bugs and nine different feature requests. You lose a lot of the benefits. Keep your commits small and granular; try to fix only one or two things each commit, and keep your repository buildable and passing all of your tests. As a bonus, your team will spend much less time on merging, particularly if they’re updating frequently as well.

Database Version Control

Closely related to source control is database version control. Many large projects end up with some sort of database back end, and that database will grow and evolve with your code. You will end up with various versions of the database lying around, and you will need a matching build of your product to work with each one. If you don’t have a good plan in place before you start putting builds out there, you’re going to create a headache for yourself.

Let me illuminate this one a little further. Lets say you’ve built a billing system. You offer a hosted solution, where each client gets their own separate virtual server running a full-blown database and your web-based product and you manage it for them, and you also support client installations, where you will generate the database on their server and allow them to both host the web product, and run their own copy of the management system. So for n clients, you have servers, n database instances, and n builds of your system.

Now you make some changes. You need to add a table or two. Change some keys. You now have version 2, which you proudly release, along with a tool to update the database from the previous version. Not everyone updates, but you think you will be fine to support version 1 as well.

After a few bug fixes to version one, and a few minor updates to version two, both databases look a little different than they did when you built the tool to update a version 1 database to version 2. Some clients aren’t on the original version 1 or the latest bugfix build, but somewhere in between. Most of your version 2 clients are fully patched, but some of them are holding out because your latest release broke some functionality they relied on. A few haven’t patched since 2.0. You’re spending too much money supporting version 1 in its various builds, version 2 has a few builds out there, migrating any given installation from 1 to 2 now costs much more than the update licence fee, marketing is screaming at you to release version 3, and you have folders full of SQL script files, a few spreadsheets to keep track of who’s applied which patches to which databases (mostly accurate), and your DBA just took 6 weeks medical leave (citing stress).

What went wrong?

You didn’t manage your database versioning. This seems to be a common problem, and (depending on your development platform) there may not be many tools out there to help you. Believe me when I tell you that a few folders full of SQL scripts and a couple of spreadsheets are not going to cut it.

What you should do about this is very much dependent on your development platform. If you’re using Hibernate and developing in Java, your solution will be very different to someone using Ruby on Rails. I do a lot of developing in C#.Net at the moment, and we have a fairly complex NHibernate-based data layer. After looking at a few of the different tools out there, I decided to roll my own. Whenever we make a change to our data layer, we write a script to update the previous version to the current one. Whenever any of our builds accesses a database, the data layer looks at a table in that database which is automatically kept up to date with the scripts which have been run. If the versions don’t match, a client tool or website will display a message, and an administration tool will provide the option of running any pending update scripts available to that build of the tool. With a single click, we can update any database we’ve produced since I introduced the system to any newer version we have a working build for. This is similar to the system RoR uses. There are other approaches that will work, but whatever you decide to do, a well-thought-out database versioning plan will save you both headaches and money.

A Build Server

I’m a huge fan of Continuous Integration. I like small, regular commits, automated builds and unit tests, and a clean build environment. I’m currently using CruiseControl.Net, but there’s a generalised version called CruiseControl, and other products which accomplish the same task. Basically, every time somebody commits some code, your CI server checks out the latest changes, builds your system, and (optionally) runs some or all of your test suite. I like to run a cut-down version of our unit tests on every build, and schedule a longer set of tests to run overnight. This gives developers rapid feedback about the success or failure of their commits, while still ensuring builds are tested exhaustively on a regular basis.

I don’t think a continuous integration server is nearly as critical to a team’s success as having a good source control solution. I’ll even accept that CI might be overkill in some circumstances. But for any decent-sized project, I really think it’s worth having one.

Issue Tracking

All too often, companies don’t have a good issue tracking system. By which I really mean: All too often, companies use Excel as their issue tracking system.

Now Excel is actually fairly good at keeping lists of things. It will even work reasonably well at issue tracking for very small projects. It doesn’t take long to out-grow it, though, and you shouldn’t cling to it once it stops working well. There are some really good systems out there, many of them with hosted solutions, some of them free. If you’re after free and open-source, Bugzilla comes immediately to mind. Wikipedia has an extensive list of issue tracking software. There are plenty of hosts out there which offer integrated issue tracking and source control, and it turns out you get some nice benefits out of combining the two into a single solution. Google Code offers free hosting to open source projects which combines source control, issue tracking, and a documentation wiki.

A Coding Standard

The problem with not having a coding standard is that some developers are lazy. Lazy can be good: a good lazy developer won’t write code three times when they could factor it out and write it once. Unfortunately, lazy can also be bad. An otherwise good developer who never writes any comments or documentation can cause all sorts of headaches for maintenance. A developer who isn’t lazy enough may write a dozen similar functions rather then spending a little thought to do the same task in a tenth as many lines.

You could end up with a religious war over the size of tab stops. I’ve seen this happen.

A good coding standard is a very useful tool. A long, wordy, and generally bloated coding standard, on the other hand, is mainly only useful when printed out and used as a door-stop. I won’t go into too many details here, but if you don’t have a standard at all, try to come up with a page or two, dealing with comments, member naming, namespaces, and the like. If it’s written down, you can hold people to it.

TEAM MEETINGS

I just can’t emphasize this enough. Developers often hate team meetings (“Can’t I just get on with writing code?”), but that’s usually because team meetings aren’t run well. You get three major benefits from having team meetings:

  1.  You won’t end up with two developers doing the same thing.
    You only get this benefit, of course, if you talk about what you will do during the meeting. Meetings are not information-gathering sessions. They’re not show-off sessions. You’re here to plan what to do, and if you talk about doing things before you dive in, you can avoid duplicating work.
  2. Developers will have a basic idea of the achievements of other team members.
    There’s no point doing lots of work getting a framework up to date if the people using the framework don’t know about it. I lied before. Meetings are show-off sessions. Just keep it short and to the point. We’re not here to listen to an hour-long lecture on the intricate object architecture your lead architect is so proud of.
  3. You’ll pick up bad decisions before they hurt you, and maybe stumble across some good ones too.
    One developer might hear about something someone else is working on, and point out something you’ve already built which will help. Or just having a bunch of smart engineers in the room for ten minutes chilling out at the end of the meeting will be enough to come up with a great new idea.

I’m sure there are other benefits, too. As a team leader, you might pick up on some under-current running through the team that’s not obvious when you’re not all together. You might discover that one of your developers has heaps of experience in an area you were about to start a project in. The possibilities are endless.

Just keep them short and relevant, and always always ALWAYS head off private discussions early. Your whole team doesn’t need to listen while two developers hash out a network protocol because they don’t bother having meetings outside the general team meeting. This is the number one cause of engineers hating meetings. If you’ve got a whole-team discussion which seems to be going around in circles, appoint one or two people to investigate the issue later, and move on.

Finally, make sure someone is taking notes and distributing them. There’s no point in spending time making a decision if you’re all going to forget it. There’s no point in giving someone a task if there’s no follow-up. There’s no point in deciding what everyone will be working on for the rest of the week if everyone’s forgotten by mid-afternoon on the same day. Send out a summary email right away. Make it so short that even the laziest of developers will skim it.

Mentors

This obviously depends a little more on your team make-up, but I love having a mentor system in place where possible. Junior developers won’t develop if they don’t have someone to advise them. Don’t leave them to struggle with an architecture, trying and failing until they end up learning a bad solution: give them someone to go to for advice, so they can go straight to implementing a good solution.

I also love having periodic training seminars. Make your senior engineers run seminars. Make your junior engineers run seminars, and make your senior engineers show up to give them feedback. Provide pizza.

A Fast Deployment Process

If your product is quick and easy to deploy, people won’t screw it up. The highest-pressure deployments are often the most crucial: marketing has a huge client on the boil, they’re ready to sign a massive contract, and they were even nice enough to give you reasonable notice about the feature they simply must demo to make the sale, but the time-line is tight, you’ve barely finished the feature, and the meeting is about to start. Now is not the time to start on the intricate multi-multi-step deployment process with a dozen things that you can get wrong. Your deployment process should include backing up the previous version, a simple build installation (preferably from your clean, always-tested build server), and a rollback process for if your deployment breaks everything. It needs to deal with database versioning on the spot (and database backup too!). There’s a lot to be done there, potentially serious consequences if you mess it up, and the opportunity to look really slick to your clients when you roll updates in seamlessly, with virtually zero downtime and never a mistake made.

And more…

These ideas, and more, are the base-line things I take in to a new employer. Not every idea works for every employer, but they’re the low-hanging fruit for making life immediately better.

Once things like these are out of the way, you can get down to doing some serious innovating, knowing that the basics aren’t going to trip you up every step of the way.