Why should you be returning an IEnumerable

I’ve seen in many places where a method returns a List<T> (or IList<T>) when it appears that it may not actually really be required, or even desirable when all things are considered.

A List is mutable, you can change the state of the List. You can add things to the List, you can remove things from the List, you can change the items the List contains. This means that everything that has a reference to the List instantly sees the changes, either because an element has changed or elements have been added or removed. If you are working in a multi-threaded environment, which will be increasingly common as time goes on, you will get issues with thread safety if the List is used inside other threads and one or more threads starts changing the List.

Return values should, unless you have a specific use case in mind already, be returning an IEnumerable<T> which is not mutable. If the underlying type is still a List (or Array or any of a myriad of other types that implement IEnumerable<T>) you can still cast it. Also, some LINQ expressions will self optimise if the underlying type is one which better supports what LINQ is doing. (Remember that LINQ expressions always take an IEnumerable<T> or IQueryable<T> anyway so you can do what you like regardless of what the underlying type is).

If you ensure that your return values are IEnumerable<T> to begin with yet further down the line you realise you need to return an Array or List<T> from the method it is easy to start doing that. This is because everything accepting the return value from the method will still be expecting an IEnumerable<T> which List<T> and Array implement. If, however, you started with a List<T> and move to returning an IEnumerable<T> then because so much code will have the expectation of a List<T> without actually needing it you will have a lot of refactoring to do just to update the reference types.

Have I convinced you yet? If not, think about this. How often are you inserting items into a collection of objects after the initial creation routine? How often do you remove items from a collection after the initial creation routine? How often do you need to access a specific item by index within a collection after the initial creation routine? My guess is almost never. There are some occasions, but not actually that many.

It took me a while to get my head around always using an IEnumerable<T>, until I realised that I almost never require to do the things in the above paragraph. I almost always just need to loop over a collection of objects, or filter a collection of objects to produce a smaller set. Both of those things can be done with just an IEnumerable<T> and LINQ.

But, what if I need a count of the objects in the List<T>, that would be inefficient with an IEnumerable<T> and LINQ? Well, do you really need a count? Oftentimes I just need to know if there are any objects at all in the collection, I don’t care how many object there actually are, in which case the LINQ extension method Any() can be used. If you do need a count LINQ is clever enough to work out that the underlying type may expose a Count property and it calls that (anything that implements ICollection<T> such as arrays, lists, dictionaries, sets, and so on) so it is not iterating over all the objects counting them up each time.

Remember, there is nothing necessarily wrong with putting a ToArray() to ToList() before returning as a reference to an IEnumerable<T> something to which a LINQ expression has been applied. That removes the issues that deferred execution can bring (e.g. unexpected surprises when it suddenly evaluates during the first iteration but breaks in the process) or repeatedly applying the filter in the Where() method or the transformation in the Select() method.

Just because an object is of a specific type, doesn’t mean you have to return that specific type.

For example, consider the services you actually need on the collection that you are returning, remembering how much LINQ gives you. The following diagram shows what each of the interfaces expect to be implemented what a number of the common collection types implement themselves.

Incidentally, the reason some of the interfaces on the Array class are in a different colour is that these interfaces are added by the runtime. So if you have a string[] it will expose IEnumerable<string>.

I’d suggest that as a general rule IEnumerable<T> should be the return type when you have anything that implements it as the return type from the method, unless something from an ICollection<T> or IList<T> (or any other type of collection) as absolutely desperately in needed and not just because some existing code expects, say, an IList<T> (even although it is using no more services from it that it would had it been an IEnumerable<T>).

The mutability that implementations of ICollection<T> and IList<T> give will prove problematic in the long term. If you have a large team with members that don’t fully understand what is going on (and this is quite common given the general level developer documentation) they are likely to change the contents of the collection without understanding its implications. In some situations this may fine, in others it may be disastrous.

Finally, if you absolutely do need to return a more specific collection type then instead of returning a reference to the concrete class, return a reference to the lowest interface that you need. For example, if you have a List<T> and you need to add further items to it, but not at specific locations in the list, then ICollection<T> will be the most suitable return type.

Tip of the Day #19: Create a list of objects instead of many lists of values

I?ve been reviewing some code and I came across something that jars. What is wrong with this is many-fold. Essentially, instead of encapsulating an related data into an entity that describes the whole the developer had created silos of data values, and you’d better hope that nothing went awry with any of it.

It looked like this:

List<string> addOnNames = new List<string>();
List<string> addOnDescriptions = new List<string>();
List<string> addOnCodes = new List<string>();
List<decimal> addOnBasePrice = new List<decimal>();

Okay, so if you don’t yet find it jarring, here are some of the things that are wrong with this structure.

To iterate through and operate on a single “add on” you have to do something like this:

for(int i = 0; i < addOnNames.Count; i++)
{
    string name = addOnNames[i];
    string description = addOnDescriptions[i];
    string code = addOnCodes[i];
    decimal basePrice = addOnCode[i];
    // Do stuff that operates on an "Add On" here
}

That?s quite a bit of work just to get at the values you need in a particular loop iteration.

If you need to pass the lists on to methods, you have to pass each in its own parameter. Method signatures start to become needlessly large. In the simplified example above (yes, the real code had a lot more in it) there are three extra parameters that need to be passed around to each method call that needs to act on a collection of add-ons.

private void DoSomethingToACollectionOfAddOns(
                     List<string> addOnNames,
                      List<string> addOnDescriptions,
                      List<string> addOnCodes,
                     List<decimal> addOnBasePrice)
{
    // Do something.
}

It is also quite difficult to use LINQ to query what is effectively an AddOn entity.

Unless you are very careful, your lists can become out of sync, and when that happens all sorts of strange and difficult to trace bugs enter into the system. In the code I was reviewing there was an edge case where by one of the lists didn?t get updated when it was initially populated. Because all the lists are assumed to be the same length the first time that they were iterated over, what should have been the final entity couldn?t be retrieved on one of the lists because it didn?t exist. As this happened well after the creation of the lists and in a method called several levels deep it was quite a job working out where the original bug was coming from.

Solution

What needs to happen here is that an entity class is created for an add on. Each of these lists indicates a field or property in an entity class. The class might simply look something like this:

public class AddOn{
    public string Name { get; set; }
    public string Description { get; set; }
    public string Code { get; set; }
    public decimal BasePrice { get; set; }
}

This encapsulates everything about a single add on entity in one place. If you want a collection of these objects you can do something like this:

    List<AddOn> addOns = new List<AddOn>();

If you want to loop over them you don?t have to write lots of code to get all the elements out of a variety of lists a simple foreach will suffice (unless you need to also know the index)

    foreach(AddOn addOn in addOns){}

If you need to pass the entity around, or a collection of the entities around then you only need to pass one parameter into a method.

    private void DoSomethingToACollectionOfAddOns(List<AddOn> addOns) {}

Using LINQ becomes much easier because now you have everything encapsulated in the one place. I can?t even imagine the convoluted joins that would be needed to process the individual values in a LINQ expression otherwise.

When creating the initial collection, if any particular property is not needed then it can be simply ignored, if need be a default value can be set in the constructor. Then you never have an issue with one collection being out of sync with another. You no longer have to worry about synchronising collections, everything to do with a single instance of the entity is in one place.

Tip of the Day #18: Dealing with data rounding issues

If you have data coming in from a database, web service or other source external to your application and it contains, say for example, price information then do not round it. Don?t attempt to apply any form of formatting to it regardless of how much the client insists that the data will be in this form (e.g. all prices at go live will be in round pounds).

Yes, having whole rounded pounds, without having to display the pennies, on the front end looks nice and pretty. However, the display of whole rounded pounds on the front end is a rendering issue and should always be left to the code that is rendering the user interface. The rounding should never take place close to the point that the information is extracted from its source.

Why? Because eventually the client will want things displayed differently, to put prices with pennies in it, and you have to display those pennies. Now you don?t have to just update the rendering code to display the pounds and pence, but you also have to track down where that rounding occurs and stop it from happening in the first place.

But that?s not all. If you round data, effectively truncating it, early then subsequent calculations will be wrong. In fact, rounding isn?t just like truncating the data. Truncating is just removing precision. Rounding can change the value. That can cause even more havoc later on.

So, why round that early anyway, especially if the data is arriving in a certain way in the first place? Well, I can only guess that perhaps because back before anything went live, in the test system they weren’t. So, just to make the test system look like the eventual live system the prices were rounded the moment they arrived in the application just to be sure they were in round pounds.

The lesson, clients change their minds and business logic should always operate on the cleanest data, unsullied by rendering constraints wherever possible. This means that calculations based on that data will remain correct, even if other factors change.

Follow up on what not to develop

Back in May I wrote about a substandard website I attempted to use in an article entitled “What not to Develop”. I also sent the hotel an email at the same time telling them of the failing of their website, however, I never got a response.

When the post went live initially, I got asked on twitter to name and shame the company in question. I suppose publically decrying a company has the effect that if people start doing that then companies will be pressurised in to providing a better service or product. These days I do not to put in a blog post the name of the company in question until I’ve given them a chance to respond to any email I might have sent. I sent the email on 16 May 2009 at 17:21 (BST), I think that’s quite enough time for a response.

I’ve decided to publish some more details so that people can at least learn from the mistake and not repeat them elsewhere. Essentially, this is an extract of the email (slightly reformatted to fit this blog)

Hello,

I tried to book on your website last night and it didn’t work – it advertised a rate to me then refused to book it. I then tried to use your Contact Us page to send you a message and that also broke and said “The web site you are accessing has experienced an unexpected error. Please contact the website administrator. “

I don’t know who the web site administrator is, but I can guess it is someone employed by TIG Global given this news story: http://www.hospitalitynet.org/news/4036652.search. Personally, if that is the quality they are delivering I wouldn’t use them again as they are not very good and are at best turning away potential customers and at worst exposing you to needless risk.

In order to [help you to] track down the errors I’ve gone back and replicated the initial problem annotating the pages as I go. You will find a number of graphics files attached.

Southwark Rose Hotel Step 1

In [the above image] I show the initial details of my availability search. Check in Friday 31st July, check out Sunday 2nd Aug. 1 adult, 0 children.

Southwark Rose Hotel Step 2

In [the above image] I show the next page. This was a pop-up, so opened a new window. The details at the top are correct and match what I’d previously entered. The description of the “Weekend Advanced Purchase” sounds perfect “Valid Friday-Sunday throughout 2009″. I see that it is £150 for the “Total price of the stay”. I press the book button.

Southwark Rose Hotel Step 3

In [the above image] I show the next page. This was another pop-up, so opened a second window. I now have 3 windows open just for your hotel. (Is this really necessary?). I spot that the number of nights has increased to 3, so I go to change it back to two. I then get an unhelpfully terse error message that says “Minimum stay: 3″ [See the next image]

Southwark Rose Hotel Step 3 error

At this point I’m some what irritated by the experience so go hunting for your contact us page. I see that it is a form only without an email address. I fill in the form and when I’m ready I press the “Submit” button. At this point I get an error page back that includes the message “The following information is meant for the website developer for debugging purposes.” You might want to tell those developers that this information is also useful for attackers and they shouldn’t be displaying it to the public. If the developers were any good what they would have done is get the website to log the information internally and display a general message to the user. If they wanted to tie up a user’s experiences with what is in the log then they might also include a randomly generated (say a GUID – globally unique identifier) identifier that is put in the log and displayed so a user can refer to when explaining what problems they were having at the time.

The error message that should have never been displayed is [as follows].

Vomiting SQL for no good reason

The details in the error page also contain my original complaint. I think I now understand where the American formatting of culture specific information (e.g. dates) is coming from.The company that produced your website was American and in their arrogance just assumed everyone else was just as comfortable using MONTH/DAY/year. I suspect that same arrogance was also responsible for the other failings I’ve pointed out here.

Regards,

Colin.

So, there you are. The hotel is the Southwark Rose Hotel, and their website was produced by TIG Global. (I’ve recently noticed it actually says that at the bottom of the web pages and I need not have searched for relevant press releases!). Incidentally, you can click on any of the graphics to be taken to my Flickr account to see the full sized version.

Tip of the Day #12 (Hard coded values)

Don’t hard code VAT or other values that can change, even if they don’t change all that often. Additionally, if you really must hard coded values in a program then make it a named constant rather than a literal value so that it can be tracked down by name. That will make it easier on the person having to maintain the program.

What not to develop

I was recently looking to book a hotel in Southwark in London. I thought I’d found the perfect hotel, it was inexpensive (by London standards) and close to where I would be visiting. They also had availability on an offer for £75 per night, so long as you checked in and out on specific days, which I happened to be doing. It looked perfect.

But then things started to go wrong.

I selected the rate from the availability page and clicked the “Book” button. The next page popped up (it opened a new window) and the details were pre-populated. However, it had changed the number of nights from 2 to 3. I didn’t want 3 nights, so I changed it back to 2 and I got a rather terse message saying “Minimum Stay: 3”. I’m happy to accept that style of message from a compiler, but not from a public facing website.

I went back and repeated the process wondering if I’d somehow clicked on the wrong rate. I double checked everything this time. Date is correct (but in an American format on a .co.uk website), number of people (1), number of nights (2), number of rooms (1), the room description explicitly gives the rules for the stay conditions for the rate. I meet all the conditions that are presented to me. I press “Book” again…

And it has pre-populated everything again and added an extra night on. I don’t want an extra night! Why even present me with a rate that I can’t have because it doesn’t meet my needs.

By this point I’m more than a wee bit frustrated. So I take off to the website’s contact us page. Instead of providing an email address there is a form to fill in. So, I write a description of the issues I was seeing on their site at which point the site fails again. It failed spectacularly badly. If it had taken me to an error page I would have just shrugged my shoulders and gone off elsewhere. But no, it decided to throw up its internals at me. It vomited details of the SQL Statement that failed, stack traces and so on.

It even had the audacity to tell me that “The following information is meant for the website developer for debugging purposes.” It might have well have said “The following information is meant for an attacker so they can destroy our server.”

So, back to my title, what not to develop. There were many failings on this website that I could see. The user experience was poor to start with and it then descended in to abject failure when it vomited its guts up at me.

1. Don’t use pop-up windows; browsers may block them; they cause confusion for some users. Absolutely do not have a pop-up out of a pop-up; it clutters my screen with needless windows.

2. Don’t have a disconnect between the display locale on the site and the TLD. If you have a geographic TLD then display information in a way that consistent with the culture of that location. e.g. Do not display dates in Month/Day/Year format when you are serving pages on a .co.uk domain. If you have customers from overseas and want to localise content for them then offer that ability, but default to your own locale if you don’t know their preference. Some websites try to be clever and will detect based on the IP of the user but even this isn’t 100% accurate. I’m located in Glasgow, but if you use a IP geo location service it shows me in Greater Manchester.

3. If a user has told you their needs do not present rates that do not meet those needs. If you do want to show near alternatives then make it clear that the details entered do not match the rate displayed, but some minor changes will get the user the rate. Put this information at the bottom or in a different colour. Anything that makes it easily distinguishable.

4. Don’t allow a business rule to mismatch the user friendly description. Make sure that the description of the rate actually matches the business rules that will be used to enforce the rate. If you have a rate that is described to the user as from X to Y don’t have the underlying business rules enforce a stay from X to Z. That will just irritate people.

5. Don’t give users terse error messages; it is unpleasant and unfriendly. If a user has made a mistake then gently point it out.

6. Don’t just send data to the database without validating it first. If a user has typed something that is too long for the column in the database for which it is destined then the software controlling the website should never have attempted to send it to the database in the first place.

7. Don’t display information that could be useful to an attacker. Don’t display stack traces, SQL Statements, system generated error messages, code snippets, etc.

Rant of the day: Learn to frickin' count!

I was in a shop recently and I bought 6 items at £5 each. A total price of £30, even I can manage that mental arithmetic without resorting to a calculator. However, the till decided that the total price was £30.01. For a penny I really can’t be bothered to argue, but it got me thinking about code quality and wondering about what awfulness must be sitting in that system to create such a simple basic mistake.

My colleagues are probably all aware of my views on code quality. I rant daily whenever I see examples on ineptitude by people that are paid money to write code. I read and respond on forums in order to help others learn their craft, or just get unstuck when they accidentally dig themselves in a hole. However, I see on an almost daily basis these days people posting their homework questions with no apparent attempt to at least try to work it out from themselves.

Take this example I found on Code Project a while ago:

I need to know how to do some simple things with arrays please help with any!
1.Find largest or smallest value
2.Count how many times a given value is in the array
3.Count the number of even or odd integers in the array
4.Add up the sum and compute the mean
5.Create another array of the same size containing the same values in reverse order
Thanks!

This is very obviously an exercise from an introductory course on the language they were studying. They just want someone to give them an answer that they can copy and paste. If this is what they are like now, imagine what they will be like years down the road writing commercial software.

I’ve seen lots of evidence over the years of people writing software by copy and pasting examples from the internet without thought of what is actually going on. This results in slow, bloated, inefficient code that is integrated very badly with the rest of the system, hard to read, hard to debug, and is just generally a complete mess.

If you are tempted to copy and paste some code snippet from the internet for your application then stop and think first. Do you actually understand the code? If not, then don’t copy and paste it. If you don’t understand it, how will you debug it?

I would say that if you are tempted to copy and paste from the internet that you create a very small test application first, paste it in to that and learn how it works. Once you understand how it all fits together and how it works you can then write a version that will integrate in to your application.

While you are at it, write some unit tests to go with it. Make sure you test for edge cases, make sure you test for some normal cases too. If you ever get a bug, then add a test that replicates the bug. So if someone suddenly discovers your software things that 5 times 6 equals 30.01 you can add a test for it, fix the bug and redeploy the system. Hopefully, this would have been caught before the public get a chance to see the glaring error and write blog posts about it.

Rant of the day: IDisposable

My colleagues are probably used to the fact that I rant about code quality frequently. I take code quality very seriously. Not because I’m especially expert in it, but because features of basic code quality make it easier for other people to read and maintain the code.

Today’s irritation comes from some code (replicated in a number of classes I might add) that implements IDisposable. It is a fine interface and by implementing it you are telling the rest of the world that you have some stuff that can’t just be left to the garbage collector to clean up. These are things like file streams, database connections, etc. Any type of scarce resource that you want to hand back as soon as you are finished with it rather than leave it up to the garbage collector.

However, I came across this “gem” in some code today where the class, basically a utility class, contained no fields (so it wasn’t holding on to anything at all, let alone anything that might be a scarce resource). Yet, for some reason it implemented IDisposable. What was it going to dispose? What could it dispose?

The answer was in the code:

public void Dispose()
{
    // Nothing to dispose of.
}

Quite!

Test Driven Development By Example

Introduction

A lot has been written on the subject of test driven development, and especially on the idea that tests ought to be written first. This is an ideal that I strive for, however, I have a tendency to write the unit tests afterwards.

Some people learn better by example. This article, rather than going in to great length about the principles of test driven development, will walk the reader through the process of building and testing an algorithm by writing the tests first, then changing the method being tested so that it fulfils the tests.

The final code and all the unit tests can be found in the accompanying download. This will require NUnit and Visual Studio 2005.

The specimen problem

I once saw a demo of how to create unit tests up front for a simple method. The method took a positive integer and turned it into roman numerals. So, I’m going to do something similar. I’m going to take an integer and turn it into words, in English. The rules for this may change depending on the language, so if English is not your only language, you may like to try to repeat this exercise in another language.

So, if the integer is 1, the result will be "one". If the integer is 23 the result will be "twenty three" and so on. Also note, I’ll be using British or Commonwealth English. So, for 101 the result in words is "one hundred and one". In American English it would be "one hundred one"

The walk through

The algorithm will also be refined through refactoring techniques. Agile development methodologies, especially eXtreme Programming, suggests that you do the simplest thing possible to get the thing working. So, going by this premise, I’ll work on the solution in bits. First, get it to return "one", then "one" or "two" depending on the input, and so on. Once 21 is reached it should become obvious where some refactoring can take place and so on. The final solution will work for 32 bit integers only.

Getting Started

Visual Studio 2005 has some features that can help with writing the tests first. A test can be written that calls into the class under test and the smart tags will prompt you with an offer to create the message stub for you.

The stub looks like this:

public static string NumberToEnglish(int p)
{
  throw new Exception("The method or operation is not implemented.");
}

If the test is completed to look like this:

[Test]
public void NumberToEnglishShouldReturnOne()
{
  string actual = English.NumberToEnglish(1);
  Assert.AreEqual("one", actual, "Expected the result to be "one"");
}

The test should fail because the stub throws an exception, rather than do what the test expects.

NUnit reports the error like this: "NumbersInWords.Test.EnglishTest.NumberToEnglishShouldReturnOne : System.Exception : The method or operation is not implemented.".

The next thing to do is to ensure that the code satisfies the demands of the unit test. Agile methodologies, such as XP, say that only the simplest change should be made to satisfy the current requirements. In that case the method being tested will be changed to look like this:

public static string NumberToEnglish(int number)
{
  return "one";
}

At this point the unit tests are re-run and they all work out.

Test "two"

Since the overall requirement is that any integer should be translatable into words, the next test should test that 2 can be translated. The test looks like this:

[Test]
public void NumberToEnglishShouldReturnTwo()
{
  string actual = English.NumberToEnglish(2);
  Assert.AreEqual("two", actual, "Expected the result to be "two"");
}

However, since the method being tested returns "one" regardless of input at this point the test fails:

NumbersInWords.Test.EnglishTest.NumberToEnglishShouldReturnTwo :
Expected the result to be "two"
	String lengths are both 3.
	Strings differ at index 0.
	expected: <"two">
	 but was: <"one">
	------------^

Again, keeping with the simplest change principle the code is updated to look like this:

public static string NumberToEnglish(int number)
{
  if (number == 1)
    return "one";
  else
    return "two";
}

The tests now pass.

Test "three" to "twenty"

A third test can now be written. It tests for an input of 3 and an expected return of "three". Naturally, at this point, the test fails. The code is updated again and now looks like this:

public static string NumberToEnglish(int number)
{
  switch (number)
  {
    case 1:
      return "one";
    case 2:
      return "two";
    default:
      return "three";
  }
}

To cut things short the new tests and counter-updates continue like this until the numbers 1 to 20 can be handled. The code will eventually look like this:

public static string NumberToEnglish(int number)
{
  switch (number)
  {
    case 1:
      return "one";
    case 2:
      return "two";
    case 3:
      return "three";
    case 4:
      return "four";
    case 5:
      return "five";
    case 6:
      return "six";
    case 7:
      return "seven";
    case 8:
      return "eight";
    case 9:
      return "nine";
    case 10:
      return "ten";
    case 11:
      return "eleven";
    case 12:
      return "twelve";
    case 13:
      return "thirteen";
    case 14:
      return "fourteen";
    case 15:
      return "fifteen";
    case 16:
      return "sixteen";
    case 17:
      return "seventeen";
    case 18:
      return "eighteen";
    case 19:
      return "nineteen";
    default:
      return "twenty";
  }
}

Test "twenty one" to "twenty nine"

At this point it looks like it will be pretty easy to do 21, but a pattern is about to emerge. After the tests for 21 and 22 have been written, the code is refactored to look like this:

public static string NumberToEnglish(int number)
{
  if (number < 20)
    return TranslateOneToNineteen(number);
  if (number == 20)
    return "twenty";
  return string.Concat("twenty ", TranslateOneToNineteen(number - 20));
}

private static string TranslateOneToNineteen(int number)
{
  switch (number)
  {
    case 1:
      return "one";
    case 2:
      return "two";
    case 3:
      return "three";
    case 4:
      return "four";
    case 5:
      return "five";
    case 6:
      return "six";
    case 7:
      return "seven";
    case 8:
      return "eight";
    case 9:
      return "nine";
    case 10:
      return "ten";
    case 11:
      return "eleven";
    case 12:
      return "twelve";
    case 13:
      return "thirteen";
    case 14:
      return "fourteen";
    case 15:
      return "fifteen";
    case 16:
      return "sixteen";
    case 17:
      return "seventeen";
    case 18:
      return "eighteen";
    default:
      return "nineteen";
  }
}

Now all the tests from 1 to 22 pass. 23 to 29 can be assumed to work because it is using well tested logic.

Test "thirty" to "thirty nine"

30 is a different story. The test will fail like this:

NumbersInWords.Test.EnglishTest.NumberToEnglishShouldReturnThirty :
Expected the result to be "thirty"
	String lengths differ.  Expected length=6, but was length=10.
	Strings differ at index 1.
	expected: <"thirty">
	 but was: <"twenty ten">
	-------------^

By using the principle of doing the simplest thing that will work. The public method changes to:

public static string NumberToEnglish(int number)
{
  if (number < 20)
    return TranslateOneToNineteen(number);
  if (number == 20)
    return "twenty";
  if (number <= 29)
    return string.Concat("twenty ", TranslateOneToNineteen(number - 20));
  return "thirty";
}

Naturally, the test for 31 will fail:

NumbersInWords.Test.EnglishTest.NumberToEnglishShouldReturnThirtyOne :
Expected the result to be "thirty one"
	String lengths differ.  Expected length=10, but was length=6.
	Strings differ at index 6.
	expected: <"thirty one">
	 but was: <"thirty">
	------------------^

So the code is changed again. This time to:

public static string NumberToEnglish(int number)
{
  if (number < 20)
    return TranslateOneToNineteen(number);
  if (number == 20)
    return "twenty";
  if (number <= 29)
    return string.Concat("twenty ", TranslateOneToNineteen(number - 20));
  if (number == 30)
    return "thirty";
  return string.Concat("thirty ", TranslateOneToNineteen(number - 30));
}

Test "forty" to "ninety nine"

A test for 40 will fail:

NumbersInWords.Test.EnglishTest.NumberToEnglishShouldReturnForty :
Expected the result to be "forty"
	String lengths differ.  Expected length=5, but was length=10.
	Strings differ at index 0.
	expected: <"forty">
	 but was: <"thirty ten">
	------------^

The necessary code change starts to draw out a pattern. Of course, the pattern could have been quite easily predicted, but since this code is being built by the simplest change only rule, the pattern has to emerge before it can be acted upon.

The pattern repeats itself until it gets to 99. By this point the public method looks like this:

public static string NumberToEnglish(int number)
{
  if (number < 20)
    return TranslateOneToNineteen(number);
  int units = number % 10;
  int tens = number / 10;
  string result = "";
  switch (tens)
  {
    case 2:
      result = "twenty";
      break;
    case 3:
      result = "thirty";
      break;
    case 4:
      result = "forty";
      break;
    case 5:
      result = "fifty";
      break;
    case 6:
      result = "sixty";
      break;
    case 7:
      result = "seventy";
      break;
    case 8:
      result = "eighty";
      break;
    default:
      result = "ninety";
      break;
  }
  if (units != 0)
    result = string.Concat(result, " ", TranslateOneToNineteen(units));
  return result;
}

Test "one hundred"

The test for 100 will fail. The failure message is:

NumbersInWords.Test.EnglishTest.NumberToEnglishShouldReturnOneHundred :
Expected the result to be "one hundred"
	String lengths differ.  Expected length=11, but was length=6.
	Strings differ at index 0.
	expected: <"one hundred">
	 but was: <"ninety">
	------------^

A quick change to the public method allows the test to pass:

public static string NumberToEnglish(int number)
{
  if (number == 100)
    return "one hundred";

  if (number < 20)
    return TranslateOneToNineteen(number);
  // Remainder omitted for brevity
}

Test "one hundred and one" to "one hundred and ninety nine"

What about 101? That test fails like this:

NumbersInWords.Test.EnglishTest.NumberToEnglishShouldReturnOneHundredAndOne :
Expected the result to be "one hundred and one"
	String lengths differ.  Expected length=19, but was length=10.
	Strings differ at index 0.
	expected: <"one hundred and one">
	 but was: <"ninety one">
	------------^

At this point it should be easy to see that some of the work that has been done previously can be re-used with a small amount of refactoring. First refactor most of the body of the public method into a class called TranslateOneToNinetyNine. Then re-test to ensure that the refactoring process hasn’t introduced any new problems.

In Visual Studio 2005 it is very easy to highlight some code and extract it into a new method, thus allowing it to be reused by being called from multiple places.

Now the public method looks like the following and all previously successful tests continue to be successful

public static string NumberToEnglish(int number)
{
  if (number == 100)
    return "one hundred";

  return TranslateOneToNinetyNine(number);
}

For numbers from 101 to 199 the pattern is "one hundred and X" where X is the result of the translation between 1 and 99. Because it would take too long to write all those tests, it is possible to write just the edge cases and one or two samples from the middle of the range. That should give enough confidence to continue onwards. In this case, the tests are for 101, 115, 155 and 199.

The code is then re-written to support those tests:

public static string NumberToEnglish(int number)
{
  if (number < 100)
    return TranslateOneToNinetyNine(number);
  if (number == 100)
    return "one hundred";

  string result = string.Concat("one hundred and ",
    TranslateOneToNinetyNine(number - 100));

  return result;
}

Test "two hundred"

The next test to write is for 200. Naturally, this test will fail at this point as the code doesn’t support it. The test looks like this:

[Test]
public void NumberToEnglishShouldReturnTwoHundred()
{
    string actual = English.NumberToEnglish(200);
    Assert.AreEqual("two hundred", actual, "Expected the result to be "two hundred"");
}

The failing test can be predicted. It looks like this:

NumbersInWords.Test.EnglishTest.NumberToEnglishShouldReturnTwoHundred :
Expected the result to be "two hundred"
	String lengths differ.  Expected length=11, but was length=22.
	Strings differ at index 0.
	expected: <"two hundred">
	 but was: <"one hundred and ninety">
	------------^

A simple change to the method can get the test passing:

public static string NumberToEnglish(int number)
{
    if (number < 100)
        return TranslateOneToNinetyNine(number);
    if (number == 100)
        return "one hundred";
    if (number == 200)
        return "two hundred";

    string result = string.Concat("one hundred and ",
        TranslateOneToNinetyNine(number - 100));

    return result;
}

Test "two hundred and one" to "two hundred and ninety nine"

Since the next part of the pattern can be seen as very similar to the one hundreds, the two edge cases and a couple of internal cases for the two hundreds are created. Presently, all those tests fail as the method has not been updated to take account of that range of integers.

In order to get those tests working the code is refactored like this:

public static string NumberToEnglish(int number)
{
    if (number < 100)
        return TranslateOneToNinetyNine(number);
    if (number == 100)
        return "one hundred";
    if (number < 200)
        return string.Concat("one hundred and ",
        TranslateOneToNinetyNine(number - 100));
    if (number == 200)
        return "two hundred";

    return string.Concat("two hundred and ",
        TranslateOneToNinetyNine(number - 200));
}

Test "three hundred" to "nine hundred and ninety nine"

From the last change to the method a pattern can be seen to be emerging. The code for dealing with the one hundreds and two hundreds are almost identical. This can be easily changed so that all positive integers between 1 and 999 can be converted into words.

After various unit tests are written to test values from 300 to 999 the code is changed to this:

public static string NumberToEnglish(int number)
{
    if (number < 100)
        return TranslateOneToNinetyNine(number);
    int hundreds = number / 100;
    string result = string.Concat(TranslateOneToNineteen(hundreds),
        " hundred");
    int remainder = number % 100;
    if (remainder == 0)
        return result;

    return string.Concat(result, " and ", TranslateOneToNinetyNine(remainder));
}

Test "one thousand"

As before, the first thing to do is write the test:

[Test]
public void NumberToEnglishShouldReturnOneThousand()
{
    string actual = English.NumberToEnglish(1000);
    Assert.AreEqual("one thousand", actual, "Expected the result to be "one thousand"");
}

Which fails:

NumbersInWords.Test.EnglishTest.NumberToEnglishShouldReturnOneThousand :
Expected the result to be "one thousand"
	String lengths differ.  Expected length=12, but was length=11.
	Strings differ at index 0.
	expected: <"one thousand">
	 but was: <"ten hundred">
	------------^

And the fix:

public static string NumberToEnglish(int number)
{
    if (number == 1000)
        return "one thousand";

    if (number < 100)
        return TranslateOneToNinetyNine(number);
    int hundreds = number / 100;
    string result = string.Concat(TranslateOneToNineteen(hundreds),
        " hundred");
    int remainder = number % 100;
    if (remainder == 0)
        return result;

    return string.Concat(result, " and ", TranslateOneToNinetyNine(remainder));
}

Test "one thousand and one" to "nine thousand nine hundred and ninety nine"

Some forward thinking will reveal that the logic will be similar for the thousands as it was for the hundreds. So, rather than repeat all that in this article, the steps to refactoring the code to work with the range from 101 to 999 can be used similarly with 1001 to 9999. The accompanying download will show all the unit tests.

The final result of this stage is that the public method has been refactored to this:

public static string NumberToEnglish(int number)
{
    if (number < 1000)
        return TranslateOneToNineHundredAndNinetyNine(number);

    int thousands = number / 1000;
    string result = string.Concat(TranslateOneToNineteen(thousands),
        " thousand");
    int remainder = number % 1000;
    if (remainder == 0)
        return result;

    if (remainder < 100)
        return string.Concat(result, " and ",
            TranslateOneToNinetyNine(remainder));

    return string.Concat(result, " ",
        TranslateOneToNineHundredAndNinetyNine(remainder));
}

private static string TranslateOneToNineHundredAndNinetyNine(int number)
{
    if (number < 100)
        return TranslateOneToNinetyNine(number);
    int hundreds = number / 100;
    string result = string.Concat(TranslateOneToNineteen(hundreds),
        " hundred");
    int remainder = number % 100;
    if (remainder == 0)
        return result;

    return string.Concat(result, " and ", TranslateOneToNinetyNine(remainder));
}

Test "ten thousand" to "nine hundred and ninety nine thousand nine hundred and ninety nine"

In the first set of test, that were expected to fail, for the condition that the integer input was a value greater than 9999 actually shows passing tests. This serendipitous circumstance is caused by the TranslateOneToNineteen method being a compatible match for prefixing the "thousand" for a range up to the 19 thousands. Through this code reuse it is possible to get a full range match all the way up to 999999 with only a change to a part of one line of code.

The public method has now changed to:

public static string NumberToEnglish(int number)
{
    if (number < 1000)
        return TranslateOneToNineHundredAndNinetyNine(number);

    int thousands = number / 1000;
    string result = string.Concat(TranslateOneToNineHundredAndNinetyNine(thousands),
        " thousand");
    int remainder = number % 1000;
    if (remainder == 0)
        return result;

    if (remainder < 100)
        return string.Concat(result, " and ",
            TranslateOneToNinetyNine(remainder));

    return string.Concat(result, " ",
        TranslateOneToNineHundredAndNinetyNine(remainder));
}

Test "one million" to "nine hundred and ninety nine million nine hundred and ninety nine thousand nine hundred and ninety nine"

The way the code changes as the number of digits increases should becoming more apparent by now. The amount of code reuse is increasing. The number of necessary tests is decreasing. Confidence is increasing

At the start the first 20 numbers each had their own test. 100% of inputs had a test. From 20 to 99 it was 20 tests. Only 25% of the inputs had tests. From 100 to 999 there were 18 tests. Just 2% of the inputs had tests. From 1000 to 999999 there were, again, just 18 tests. This represents just 2 thousandths of one percent.

At this point the public method has been refactored to look like this:

public class English
{
public static string NumberToEnglish(int number)
{
    if (number < 1000000)
        return TranslateOneToNineHundredAndNinetyNineThousandNineHundredAndNinetyNine(number);

    int millions = number / 1000000;
    string result = string.Concat(TranslateOneToNineHundredAndNinetyNine(millions),
        " million");
    int remainder = number % 1000000;
    if (remainder == 0)
        return result;

    if (remainder < 100)
        return string.Concat(result, " and ",
            TranslateOneToNinetyNine(remainder));

    return string.Concat(result, " ",
        TranslateOneToNineHundredAndNinetyNineThousandNineHundredAndNinetyNine(remainder));
}

private static string TranslateOneToNineHundredAndNinetyNineThousandNineHundredAndNinetyNine(int number)
{
    if (number < 1000)
        return TranslateOneToNineHundredAndNinetyNine(number);

    int thousands = number / 1000;
    string result = string.Concat(TranslateOneToNineHundredAndNinetyNine(thousands),
        " thousand");
    int remainder = number % 1000;
    if (remainder == 0)
        return result;

    if (remainder < 100)
        return string.Concat(result, " and ",
            TranslateOneToNinetyNine(remainder));

    return string.Concat(result, " ",
        TranslateOneToNineHundredAndNinetyNine(remainder));
}

Test "one billion" upwards

The limitations of an integer (Int32) mean that this section reaches the upper limits of 2147483647. Unless an Int64 is used there is no continuation to the trillion range.

Final stages

To this point all positive integers are successfully being translated from an integer into a string of words. At this point, through code reuse, it should be a fairly simple matter to refactor the code to work with negative numbers and zero.

Zero is easy enough. The unit test is put in place:

[Test]
public void NumberToEnglishShouldReturnZero()
{
  string actual = English.NumberToEnglish(0);
  Assert.AreEqual("zero", actual, "Expected the result to be "zero"");
}

And it promptly fails because there is no code to support it. The public method is changed so that it does a quick check at the start:

public static string NumberToEnglish(int number)
{
  if (number == 0)
      return "zero";
  if (number < 1000000000)
      return TranslateOneToNineHundredAndNintyNineMillion...(number);

  int billions = number / 1000000000;
  string result = string.Concat(TranslateOneToNineteen(billions), " billion");

  int remainder = number % 1000000000;
  if (remainder == 0)
    return result;

  if (remainder < 100)
    return string.Concat(result, " and ", TranslateOneToNinetyNine(remainder));

  return string.Concat(result, " ",
    TranslateOneToNineHundredAndNintyNineMillion...(remainder));
}

Now the test passes. Next is to permit negative numbers.

[Test]
public void NumberToEnglishShouldReturnNegativeOne()
{
  string actual = English.NumberToEnglish(-1);
  Assert.AreEqual("negative one", actual, "Expected the result to be "negative one"");
}

This fails so the code is refactored to this:

public static string NumberToEnglish(int number)
{
  if (number == 0)
    return "zero";

  string result = "";
  if (number < 0)
    result = "negative ";

  int absNumber = Math.Abs(number);

  return string.Concat(result,
    TranslateOneToTwoBillion...SixHundredAndFortySeven(absNumber));
  // Method call above contracted for brevity
}

And the test passes. But what about that final edge case? int.MinValue? The test is written but it fails. The reason is thatMath.Abs(int.MinValue) isn’t possible. So, as this is a one off case, the easiest solution is to put in a special case into the public method:

// Special case for int.MinValue.
if (number == int.MinValue)
  return "negative two billion one hundred and forty seven million " +
    "four hundred and eighty three thousand six hundred and forty eight";

Conclusion

This article demonstrates how to unit test and build a piece of code in small increments. The tests continually prove that the developer is on the right path. As the code is built the constant rerunning of existing tests prove that any new enhancements do not break existing code. If, for any reason, it is later found that the code contains a bug, a test can easily be created that exercises that incorrect code and a fix produced.

The full final code is available in the associated download along with a set of NUnit tests.

Downloads

You can download the example code for this article here.

Follow

Get every new post delivered to your Inbox.

Join 26 other followers