Checking for NULL using Entity Framework

Here is a curious gotcha using the Entity Framework: If you are filtering on a value that may be null then you may not be getting back the results you expect.

For example, if you do something like this:

var result = context.GarmentsTryOns
    .Where(gto => gto.WeddingId == weddingId
                  && gto.PersonId == personId);

And personId is null then you won’t get any results back. This is because under the hood the query is structured like this:

…WHERE WeddingId = @p0 AND PersonId = @p1

That’s all great when @p1 has a value, but when it is null SQL Sever says nothing matches. In SQL Server, NULL is not a value, it is the absence of a value, it does not equal to anything (including itself) e.g. Try this:

SELECT CASE WHEN NULL = NULL THEN 1 ELSE 0 END

That returns 0!

Anyway, if you want to test NULL-ability, you need the IS operator, e.g.

SELECT CASE WHEN NULL IS NULL THEN 1 ELSE 0 END

That returns 1, which is what you’d expect.

Now, for whatever reason, EF is not clever enough to realise that in the above example, personId is (perfectly validly) null in some cases and switch from using = to IS as needed. So, what we need is a little jiggery-pokery to get this to work. EF can tell if you hard code the null, so you can do this in advance to set things up:

Expression<Func<GarmentTryOns, bool>> personExpression;
if (personId == null)
    personExpression = gto => gto.PersonId == null;
else
    personExpression = gto => gto.PersonId == personId;

This can then be injected as a Where filter onto the query and it EF will interpret it correctly. Like this:

var result = context.GarmentTryOns
                      .Where(gto => gto.WeddingId == weddingId)
                      .Where(personExpression);

The SQL that EF produces now correctly uses PersonId IS NULL when appropriate.

Handling bounces on Amazon SES

If you send to an email that does not exist, Amazon SES will perform some handling of the bounce before passing the details on to you.

When you send email through Amazon SES you may notice that the email arrives with a Return Path that looks something like this: 00000331b8b1d648-b8302192-701f-124d-a1d5-d268912677de-135246@email-bounces.amazonses.com

As it happens, the large delimited hex number before the @ sign is the same value that you got back from the SendEmail or SendRawMail response. (If you’re unfamiliar with sending an email see previous posts on SendEmail and SendRawEmail.)

// client is a AmazonSimpleEmailServiceClient
// request is a SendEmailRequest
SendEmailResponse response = client.SendEmail(request);
string messageId = response.SendEmailResult.MessageId;

When the email bounces, it will go first to Amazon SES where they will note which email bounced. Then the email will be forwarded on to you and you will receive the bounced email. (Be aware, tho’, that the email may end up in your spam folder – they did for me). Exactly where the bounce email will go depends on the API call you are using and the fields that you have populated in the outgoing email. The rules are detailed on the Bounce and Complaints notifications page of the Amazon SES Developer’s Guide.

If you look in the headers of this email you’ll see that Message Id again in various parts of the header. e.g.

X-Original-To: 00000331b8b1d648-b8302192-701f-124d-a1d5-d268912677de-135246@email-bounces.amazonses.com
Delivered-To: 00000331b8b1d648-b8302192-701f-124d-a1d5-d268912677de-135246@email-bounces.amazonses.com
Message-Id: <00000331b8b1d648-b8302192-701f-124d-a1d5-d268912677de-135246@email.amazonses.com>

How you process these bounces on your side is up to you. Amazon do not, yet (I’m hopeful they will and it has been requested a lot) provide an automated way of using the API for querying which emails are bouncing, are complained about or are rejected.

At present the best detail you are going to get on bounced emails is in the aggregate data provided through the GetSendStatistics API call or via the graphs on the AWS Console.

What happens if I send more email to an address that bounced?

If you continue to send emails to an address that bounces you will get a MessageRejectedException when you call SendEmail or SendRawEmail with the message “Address blacklisted.”

Conclusion on bounce handling

At present bounce handling using Amazon SES isn’t great (but it’s certainly no better than using a plain old SMTP service) however Amazon do appear to be interested in providing better support for handling bounces and the like. It may very well be better supported in the future.

Verifying Senders with Amazon SES

I’ve already written a couple of pieces about Amazon Simple Email Service (SES) on sending Email and sending emails with attachments.

Why do you have to verify senders?

It is important to note that while in development mode you have to verify all recipients and senders, in production mode you still have to verify the senders (this is, presumably, an anti-spam measure to ensure the high quality of email).

If you attempt to send an email from an email address that is not registered you will get a MessageRejectedException when you call SendEmail or SendRawEmail with the message “Email address is not verified”.

Listing and verifying senders

You can add and view senders in via AWS Console which is fine if all you need is to add the odd sender now and again. However, if your application is going to send on behalf of a number of people then you need a way to automate this.

The AWS API contains three methods that help with managing verified email addresses. You can VerifyEmailAddress, DeleteVerifiedEmailAddress and ListVerifiedEmailAddresses.

To Verify an email address

Here is the code to verify an email address

var config = new AmazonSimpleEmailServiceConfig();
var client = new AmazonSimpleEmailServiceClient(config);
VerifyEmailAddressRequest request = new VerifyEmailAddressRequest();
request.EmailAddress = "joe.bloggs@example.com";
var response = client.VerifyEmailAddress(request);

The an email will be sent to the email address listed

from        no-reply-aws@amazonaws.com via email-bounces.amazonses.com
to:         joe.bloggs@example.com
date:       13 November 2011 15:08
subject:    Amazon SES Address Verification Request
mailed-by:  email-bounces.amazonses.com

Dear Amazon SES customer:

We have received a request to authorize an email address for use with Amazon
SES.  To confirm that you are authorized to use this email address, please go
to the following URL:

https://email-verification.us-east-1.amazonaws.com/...........

Your request will not be processed unless you confirm the address using this
URL.

To learn more about sending email from Amazon SES, please refer to the Amazon
SES Developer Guide.

Sincerely, Amazon Web Services

Once you’ve clicked the link you’ll get a page with a message like this:

Congratulations!

You have successfully verified an email address with Amazon Simple Email Service. You can now begin sending email from this address.

If you are a new Amazon SES user and have not yet received production access to Amazon SES, then you can only send email to addresses that you have previously verified. To view your list of verified email addresses, go to the AWS Management Console or refer to the Amazon SES Developer Guide.

If you have already been approved for production access, then you can send email to any address.

Thank you for using Amazon SES.

Once this message has been displayed the email addresses will be displayed in the SES Console and you will be able to send email from this email address (in development mode it also means you will be able to send email to the address)

Listing the verified email addresses

In order to check the email addresses that have passed through the verification process you can use the method ListVerifiedEmailAddresses.

var config = new AmazonSimpleEmailServiceConfig();
var client = new AmazonSimpleEmailServiceClien(config);
var request = new ListVerifiedEmailAddressesRequest();
var response = client.ListVerifiedEmailAddresses(request);
var result = response.ListVerifiedEmailAddressesResult;
List<string> addresses = result.VerifiedEmailAddresses;

The addresses that have been successfully verified will be listed in the addresses list.

If the email goes out (from VerifyEmailAddress or from the AWS Console), and it the address is not yet verified then it won’t appear in the list.

Removing a verified email address

If you no longer need to send from an email address you can use the DeleteVerifiedEmailAddress method.

var config = new AmazonSimpleEmailServiceConfig();
var client = new AmazonSimpleEmailServiceClient(config);
var request = new DeleteVerifiedEmailAddressRequest();
request.EmailAddress = viewModel.NewEmailAddress;
var response = client.DeleteVerifiedEmailAddress(request);

Sending more than a basic email with Amazon SES

Previously, I wrote about getting started with Amazon’s Simple Email Service, and I included details of how to send a basic email. The SendEmail method is excellent at sending basic emails with HTML or Text bodies. However, it doesn’t handle attachments. For that, you need to use SendRawEmail.

SendRawEmail doesn’t give you much functionality. In fact, you have to do all the work to construct the email yourself. However, it does mean that you can do pretty much what you need with the email.

There are still some limitations. Amazon imposes a 50 recipient limit per email, a maximum 10Mb per email, and you can only add a small number of file types as an attachment. This is, I suspect, in order to reduce the ability for people to use the service to spam and infect other people while permitting most of all legitimate uses for the service.

Building an email

When I said that you have to do all the work to construct the email, I really did mean that. You have to figure out the headers, the way the multi-part MIME is put together the character encoding (because email is always sent using a 7-bit encoding) and so on.

I tried to do this, and it it was most frustrating work. The tiniest thing seemed to put Amazon SES into a sulk.

However, I did find a piece of code that someone else had written to do the heavy work for me. Essentially, what he’s doing is constructing a mail message using the built in System.Net.Mail.MailMessage type in .NET and then using .NET’s own classes to create the raw mail message as a MemoryStream, which is what Amazon SES wants.

I’ve refactored the code in the linked post so that it is slightly more efficient if you are calling it multiple times. It uses reflection, and some of the operations need only be carried out once regardless of the number of times you generate emails, so it removes those bits off to a static initialiser so that they only happen the once.

Here’s my refactored version of the code:

public class BuildRawMailHelper
{
    private const BindingFlags nonPublicInstance =
        BindingFlags.Instance | BindingFlags.NonPublic;

    private static readonly ConstructorInfo _mailWriterContructor;
    private static readonly MethodInfo _sendMethod;
    private static readonly MethodInfo _closeMethod;

    static BuildRawMailHelper()
    {
        Assembly systemAssembly = typeof(SmtpClient).Assembly;
        Type mailWriterType = systemAssembly
            .GetType("System.Net.Mail.MailWriter");

        _mailWriterContructor = mailWriterType
            .GetConstructor(nonPublicInstance, null,
                new[] { typeof(Stream) }, null);

        _sendMethod = typeof(MailMessage).GetMethod("Send",
            nonPublicInstance);

        _closeMethod = mailWriterType.GetMethod("Close",
            nonPublicInstance);
    }

    public static MemoryStream ConvertMailMessageToMemoryStream(
        MailMessage message)
    {
        using (MemoryStream memoryStream = new MemoryStream())
        {
            object mailWriter = _mailWriterContructor.Invoke(
                new object[] {memoryStream});

            _sendMethod.Invoke(message, nonPublicInstance, null,
                                new[] {mailWriter, true}, null);

            _closeMethod.Invoke(mailWriter, nonPublicInstance,
                null, new object[] {}, null);

            return memoryStream;
        }
    }
}

At first glance, the fact that the MemoryStream is disposed of does seem a bit counter-intuitive, however some methods of MemoryStream still function when the stream is closed, such as ToArray().

Incidentally, if you want to see what the raw email looks like you can use a piece of code like this to get the raw email as a string:

MemoryStream memoryStream =
    BuildRawMailHelper.ConvertMailMessageToMemoryStream(mailMessage);
byte[] data = rawMessage.Data.ToArray();
using (StreamReader reader = new StreamReader(new MemoryStream(data)))
{
    string rawMail = reader.ReadToEnd();
    Console.Write(rawEmail);
}

Using SendRawEmail

Because you’re doing all the work, the code that actually interacts with Amazon SES is very simple.

// mailMessage is an instance of a System.Net.Mail.MailMessage
var config = new AmazonSimpleEmailServiceConfig();
var client = new AmazonSimpleEmailServiceClient(config);
SendRawEmailRequest request = new SendRawEmailRequest();
request.RawMessage = new RawMessage();
request.RawMessage.Data = BuildRawMailHelper
    .ConvertMailMessageToMemoryStream(mailMessage);
var response = client.SendRawEmail(request);

And that’s it. You can now send emails with attachments, and anything else you can do with a MailMessage.

First(OrDefault) Vs. Single(OrDefault)

There are two mechanisms (each with an …OrDefault variant) in LINQ for getting one item out of an enumeration. They are First and Single. There is a difference between the two and you can produce code that functions incorrectly if the wrong one is used.

So, what’s the main difference? They both sound like they’ll return just one item out from the enumeration. And, indeed, they do.

First will return the first item that it encounters that matches the predicate (if supplied). Whereas Single will return the one and only item that it encounters that matches the predicate (if supplied). If Single encounters a second item that matches the predicate then it throws an exception. If no predicate is supplied, it throws an exception simply if the enumeration has more that one item.

Why would there be two things that do almost the same thing that are so subtly different? First exists so that you can get the first item regardless of how many items there may actually be. Single exists to get you the one and only item. Single is useful when your predicate operates on a primary key. For example:

data.Single(d => d.PrimaryKey == idToMatch)

The …OrDefault variants will return the default value for the type (for reference types that will be null) if there are no matches found. Otherwise, both First and Single throw an exception if no items are encountered.

Lets look at some code.

First

string[] data = new[]{"Zero", "One", "Two", "Three",
    "Four", "Five", "Six", "Seven", "Eight", "Nine", "Ten"};
var first = data.First();

In this case, first will contain the value of "Zero".

If a predicate is added to the call to First then we can see what happens if there is no match.

string[] data = new[]{"Zero", "One", "Two", "Three",
    "Four", "Five", "Six", "Seven", "Eight", "Nine", "Ten"};
var first = data.First(x => x.Length > 10);

In this case, there are no matches, and an InvalidOperationException is thrown with the message “Sequence contains no matching element”

The same thing will happen if the initial set of data is empty

string[] empty = new string[0];
var first = empty.First();

You can happily supply a predicate that may match more than one item in the enumeration

Single

For example

string[] onlyOneItem = new string[]{"Only item"};
var single = onlyOneItem.Single();

This will return the one and only item that matches.

string[] data = new[]{"Zero", "One", "Two", "Three",
    "Four", "Five", "Six", "Seven", "Eight", "Nine", "Ten"};
var single = data.Single();

This will thrown an exception. If result set contains more than one item an InvalidOpertationException will be thrown with a message of “Sequence contains more than one element”

string[] empty = new string[0];
var single = empty.Single();

This will throw exactly the same exception as it’s First counterpart; an InvalidOperationException is thrown with the message “Sequence contains no matching element”

…OrDefault

This is where things get a little bit more interesting. This says that if the result set contains zero items null (for reference types) is returned. In the case of First, the result set can contain zero, one or many items and it won’t throw an exception. In the case of Single only result sets containing zero or one item will return while any more will result in an exception.

So… what about this scenario:

string[] data = new[]{null, "Zero", "One", "Two", "Three",
    "Four", "Five", "Six", "Seven", "Eight", "Nine", "Ten"};
var first = data.FirstOrDefault();

The first value of the set is genuinely null. How do you tell the difference between that and the result set being simply empty without throwing an exception?

You could just go back to using the First variant and catching the exception. Or you could (if your result set can be enumerated many times without issue, e.g. the underlying object is an Array or List) use Any to test if the set contains any data in advance. Like this:

string[] data = new[]{null, "Zero", "One", "Two", "Three",
    "Four", "Five", "Six", "Seven", "Eight", "Nine", "Ten"};
if (data.Any())
{
    var first = data.FirstOrDefault();
    // Do stuff with the value
}

Tip of the day: Expire a cookie, don’t remove it

I recently found a bug in my code that I couldn’t fathom initially until I walked through the HTTP headers in firebug. In short, you cannot simply remove a cookie by calling Remove(cookieName) on the HttpCookieCollection. That will have no effect. You have to expire the cookie in order for it to be removed.

In other words, you need code like this:

HttpCookie cookie = new HttpCookie("MyCookie");
cookie.Expires = DateTime.UtcNow.AddYears(-1);
Response.Cookies.Add(cookie);

When you create a cookie, the response from the server will contain an HTTP Header called Set-Cookie that contains the value of the cookie.

For example, if we create a cookie like this:

HttpCookie cookie = new HttpCookie("MyCookie");
cookie.Value = "The Value of the cookie";
Response.Cookies.Add(cookie);

Then the Response will contain this:

Set-Cookie    MyCookie=The Value of the cookie; path=/

Each subsequent request to the server will contain the cookie, like this:

Cookie        MyCookie=The Value of the cookie

The responses from the server do not contain the cookie unless the server is updating the value of the cookie.

When the cookie is to be removed forcefully, the server must update the cookie with a new expiry, like this:

HttpCookie cookie = new HttpCookie("MyCookie");
cookie.Expires = DateTime.UtcNow.AddYears(-1);
Response.Cookies.Add(cookie);

The response will then have this header:

Set-Cookie    MyCookie=; expires=Mon, 20-Sep-2010 21:32:53 GMT; path=/

And in subsequent requests the cookie won’t be present any more as the browser will have removed it.

Installing a web site on a new server

Here are some blog posts that have been useful to me lately when I got caught out installing a website on a new server (I will eventually get that automated build and deploy process actually performing the deploy step successfully!!)

The configuration section ‘system.web.extensions’ cannot be read because it is missing a section declaration:

While installing a website on a new Windows Server I came across this error. In short, it was because the App Pool was set up as a .NET 2.0 application rather than a 4.0. The blog post explains what was going on and how to fix it.

[Resolved] Could not load file or assembly ‘XXXXX’ or one of its dependencies. An attempt was made to load a program with an incorrect format:

Although this didn’t help me in the end, it does suggest a solution. In my case, because of a third-party dependency that requires an x86 build, it couldn’t be used. In time that dependency will be removed, in the meantime the following was more helpful to me…

Could not load file or assembly ‘PresentationCore’ or one of its dependencies. An attempt was made to load a program with an incorrect format. : A solution:

This post did give me the pointer I needed to the setting that had to be changed to get the web site working.

LINQ query performance

A while ago I was reviewing some code and I came across some code that looked like this

if (corpus.Where(a => a.SomeProperty == someValue).Count() > 0)
{
    // Do Stuff
}

And it got me thinking that it may not be the best way to do this. What is really being asked here is: “Are there any items in the enumerable?” The count is not actually important in this situation. I considered that it would probably be more efficient to write:

if (corpus.Where(a => a.SomeProperty == someValue).Any())
{
    // Do stuff
}

Then I read somewhere (unfortunately, I didn’t note the URL) that for certain situations the .Any() extension method on IEnumerable<T> can be inefficient in certain scenarios. For instance, if concrete type is actually a List<T> which maintains its own Count. In that instance the cost of setting up the Enumerator and calling MoveNext() to determine the existence of at least one element would be more expensive an operation than calling Count on the List<T>.

I was curious about that so I set about working out the relative performance characteristics of a number of the LINQ extension methods. I should note that these were all on LINQ to Objects out of the box so don’t measure how these methods would perform relatively for things like, say, LINQ to SQL.

I tested various scenarios, some where the IEnumerable<T> is a lightweight generator of elements, in this case an Enumerable.Range(…), in other cases I used a List<T> either by a concrete reference or by an reference to the IEnumerable<T> interface.

All timings in this post relate to my desktop machine which is running Windows 7 64bit with 8Gb RAM and an AMD Phenom II X4 955 running at 1.6GHz (which for some unknown reason it won’t run at the full 3.2GHz)

Counting elements

In the first set of tests I counted the number of elements. For the cases where I called the Count property directly on the List<T> and used the Count() extension method on IEnumerable<T> where the concrete type was the List<T> the result was returned in O(1). The LINQ method was 24 times slower.

Where the IEnumerable<T> did not also implement the ICollection<T> interface (as in the case where the values were being generated by Enumerable.Range(…) method) the Count() extension method took O(n) time to return the answer.

The graph above shows the number of Ticks (vertical axis) taken to complete the counting task with n (horizontal axis) elements. A tick is roughly 1/1600th of a millisecond. So for 2000000 elements it took 72.5ms to count them.

Compare that for instances where the Count property was called directly (0.00413 Ticks or 0.00258µs [millionths of a second]) or where the Count() method was called on something that could be cast to an ICollection or ICollection<T> (0.0989 Ticks or 0.0618µs)

So far it looks good for cases where the underlying type implements the ICollection<T> or ICollection interface. However remember as soon as you start filtering the data (e.g. with a Where() method call) then you are returning an IEnumerable<T> which then operates in O(n) time. Also remember that the Where() clause will add some overhead as it has to process the filter as well.

Any elements

It should be no surprise that using our test set of a List<T> and an Enumerable.Range(…) the Any() method runs in O(1) time. Both took similar amounts of time, the former taking 0.278 Ticks (0.174µs) per call, and the latter taking 0.296 Ticks (0.185µs) per call. I suspect that time on the latter is more due to the the small amount of additional time taken to generate additional elements as the enumerator progresses.

However, if you have a reference to something that already implements ICollection<T> which defines a Count property, such as a List<T>, you may find it is faster to perform (corpus.Count>0). I found that for the List<T> I’d created for the test runs it was only marginally slower than the raw call to Count taking 0.00607 ticks (0.00379µs) per call.

Any elements with filter

If you have a filter (a Where clause) then Any may take longer that O(1). It will take as long as it takes to find anything that matches the filter or O(n) if nothing matches the filter.

I ran three tests, one where the filtered condition was met on the first element, one where the condition was met in the middle of the set and one where the condition was not met until the last element.

Summary

If you have a concrete type the performance is better when using the Count property both for cases when you need to know the number of elements in the corpus or when you need to know if there any any elements at all.

If you simply need to know if there are any elements at all in the corpus then the use of Any() works out better than using the LINQ extension method Count() as Count() must traverse the entire corpus (unless it derives from ICollection<T> whereas Any() will short circuit at the first available opportunity.

Tip of the day: Splitting a string when encountering whitespace

In .NET the string class has a Split method that splits the string at the separator character(s) that you specify. However, if you want to split the string at any instance of whitespace you don’t have to create a Split call that enumerates all those different types of whitespace… and there are actually quite a lot! Instead you can just call Split without any parameters and it will split at whitespace regardless of the type.

For example, the following program, in which I hope I’ve managed to use all the different types of whitespace in Unicode, will produce the output below:

static void Main(string[] args)
{
  string source = "Anu0020inspiredrcalligrapherncanu1680createu180epagesu2000ofu2001"+
    "beautytusingu2002sticku2003ink,u2004quill,u2005brush,u2006pick-axe,u2007buzzu2008"+
    "saw,u2009oru200aevenu202fstrawberryu205fjam."+
    Environment.NewLine+
    "Theu3000quicku2028brownu2029foxu0009jumpsu000aoveru000btheu000clazyu000ddog."+
    Environment.NewLine+
    "Whitespaceu0085Foru00a0the win!";

  string[] words = source.Split();

  foreach(string word in words)
  {
    Console.WriteLine(word);
  }
}

Produces this output:

An
inspired
calligrapher
can
create
pages
of
beauty
using
stick
ink,
quill,
brush,
pick-axe,
buzz
saw,
or
even
strawberry
jam.

The
quick
brown
fox
jumps
over
the
lazy
dog.

Whitespace
For
the
win!

Building messages in parallel

I recently saw some code where the developer was attempting to build up messages inside tasks that were being reported outside of the task.

In a sequential system it is easy enough to do this. You have various options available to you, such as

  • message += …;
  • StringBuilder
  • Streams

However, in a parallel system these all fall down because you lose control over the sequencing. You can regain some control by using appropriate locks but then you add in bottlenecks around the synchronisation points which is something you want to minimise in a parallel system.

I’ll show you what I mean. Each example below is attempting to build up a large message containing messages from smaller subroutines. For the moment, let’s assume that the exact order of the individual messages are not important. It may be a series of log entries, or a list of errors to correct. The only important thing is that each individual message is not garbled in anyway. [Skip the code]

The example message is actually just a set of letters and numbers. In the final message each letter must appear 10 times and each number 26 times. Once the tasks have finished, the final messages are examined to see what happened.

Sequential Reference code

Here is the code:

class Program
{
    static void Main(string[] args)
    {
        string result = SequentialReference();

        ShowResult(result);

        Console.WriteLine("Program finished");
        Console.ReadLine();
    }

    private static string SequentialReference()
    {
        string result = string.Empty;

        for(int i=0; i<10; i++)
        {
            for(char c='A'; c<='Z'; c++)
            {
                result += string.Format("{0}{1}", i, c);
            }
            result += Environment.NewLine;
        }

        return result;
    }

    private static void ShowResult(string message)
    {
        // Code to display the message and the
        // results of the tests
    }
}

The code generates the messages, then outputs the results. For the reference sequential code (which is what we want all the results to look like) we get:

0A0B0C0D0E0F0G0H0I0J0K0L0M0N0O0P0Q0R0S0T0U0V0W0X0Y0Z
1A1B1C1D1E1F1G1H1I1J1K1L1M1N1O1P1Q1R1S1T1U1V1W1X1Y1Z
2A2B2C2D2E2F2G2H2I2J2K2L2M2N2O2P2Q2R2S2T2U2V2W2X2Y2Z
3A3B3C3D3E3F3G3H3I3J3K3L3M3N3O3P3Q3R3S3T3U3V3W3X3Y3Z
4A4B4C4D4E4F4G4H4I4J4K4L4M4N4O4P4Q4R4S4T4U4V4W4X4Y4Z
5A5B5C5D5E5F5G5H5I5J5K5L5M5N5O5P5Q5R5S5T5U5V5W5X5Y5Z
6A6B6C6D6E6F6G6H6I6J6K6L6M6N6O6P6Q6R6S6T6U6V6W6X6Y6Z
7A7B7C7D7E7F7G7H7I7J7K7L7M7N7O7P7Q7R7S7T7U7V7W7X7Y7Z
8A8B8C8D8E8F8G8H8I8J8K8L8M8N8O8P8Q8R8S8T8U8V8W8X8Y8Z
9A9B9C9D9E9F9G9H9I9J9K9L9M9N9O9P9Q9R9S9T9U9V9W9X9Y9Z

Does the result contain all the necessary parts?
10 of each letter; 26 of each number
0: 26 occurrences: Pass
1: 26 occurrences: Pass
2: 26 occurrences: Pass
3: 26 occurrences: Pass
4: 26 occurrences: Pass
5: 26 occurrences: Pass
6: 26 occurrences: Pass
7: 26 occurrences: Pass
8: 26 occurrences: Pass
9: 26 occurrences: Pass
A: 10 occurrences: Pass
B: 10 occurrences: Pass
C: 10 occurrences: Pass
D: 10 occurrences: Pass
E: 10 occurrences: Pass
F: 10 occurrences: Pass
G: 10 occurrences: Pass
H: 10 occurrences: Pass
I: 10 occurrences: Pass
J: 10 occurrences: Pass
K: 10 occurrences: Pass
L: 10 occurrences: Pass
M: 10 occurrences: Pass
N: 10 occurrences: Pass
O: 10 occurrences: Pass
P: 10 occurrences: Pass
Q: 10 occurrences: Pass
R: 10 occurrences: Pass
S: 10 occurrences: Pass
T: 10 occurrences: Pass
U: 10 occurrences: Pass
V: 10 occurrences: Pass
W: 10 occurrences: Pass
X: 10 occurrences: Pass
Y: 10 occurrences: Pass
Z: 10 occurrences: Pass
Does the result contain correctly sequenced individual messages?
Each sequence 52 chars; 0A0B0C... 1A1B1C.... etc.
Message 0: PASS - 52 char; PASS - Message content as expected
Message 1: PASS - 52 char; PASS - Message content as expected
Message 2: PASS - 52 char; PASS - Message content as expected
Message 3: PASS - 52 char; PASS - Message content as expected
Message 4: PASS - 52 char; PASS - Message content as expected
Message 5: PASS - 52 char; PASS - Message content as expected
Message 6: PASS - 52 char; PASS - Message content as expected
Message 7: PASS - 52 char; PASS - Message content as expected
Message 8: PASS - 52 char; PASS - Message content as expected
Message 9: PASS - 52 char; PASS - Message content as expected
Program finished

String Concatenation in parallel

The first bad parallel example is this one, where the message is built up using string concatenation.  The code is almost identical to the sequential example, except that the for loop is now a Parallel.For and I’ve injected a Sleep to simulate performing other work (such as getting the data necessary to build the messages).

class Program
{
    static void Main(string[] args)
    {
        string message = StringConcat();

        ShowResult(message);

        Console.WriteLine("Program finished");
        Console.ReadLine();
    }

    private static string StringConcat()
    {
        string result = string.Empty;

        Parallel.For(0, 10,
                        (int i) =>
                            {
                                for (char c = 'A'; c <= 'Z'; c++)
                                {
                                    result += string.Format("{0}{1}",i, c);
                                    Thread.Sleep(15);
                                }
                                result += Environment.NewLine;
                            });

        return result;
    }
}

And the results are starkly different:

0A2A4A6A8A2B0B6B4B8B2C0C6C8C2D0D6D4D8D2E0E4E8E2F0F4F6F8F2G0G4G6G8G2H0H4H8H2I0I4I
8I2J0J6J4J8J2K0K4K8K2L0L6L4L8L2M0M6M8M2N0N4N8N2O0O4O8O2P0P6P4P8P2Q0Q4Q8Q2R0R6R4R
8R2S0S6S4S8S2T0T4T8T2U0U4U8U2V0V6V4V8V2W0W6W8W2X0X6X8X2Y0Y4Y8Y2Z0Z4Z6Z8Z
3A
1A
5A
7A
9A3B1B5B7B9B3C1C7C9C3D1D5D9D3E1E7E9E3F1F5F7F9F3G1G5G9G3H1H5H9H3I1I7I5I9I3J1J7J9J
3K1K5K7K9K3L1L7L5L9L3M1M5M9M3N1N5N7N9N3O1O5O7O9O3P1P7P9P3Q1Q7Q5Q9Q3R1R7R5R9R1S3S
7S5S9S3T1T5T7T9T3U7U5U9U1V5V9V3W7W9W1X3X7X5X9X3Y1Y7Y5Y9Y1Z7Z9Z

Does the result contain all the necessary parts?
10 of each letter; 26 of each number
0: 26 occurrences: Pass
1: 24 occurrences: Fail
2: 26 occurrences: Pass
3: 24 occurrences: Fail
4: 22 occurrences: Fail
5: 20 occurrences: Fail
6: 16 occurrences: Fail
7: 21 occurrences: Fail
8: 26 occurrences: Pass
9: 26 occurrences: Pass
A: 10 occurrences: Pass
B: 10 occurrences: Pass
C: 8 occurrences: Fail
D: 9 occurrences: Fail
E: 8 occurrences: Fail
F: 10 occurrences: Pass
G: 9 occurrences: Fail
H: 8 occurrences: Fail
I: 9 occurrences: Fail
J: 9 occurrences: Fail
K: 9 occurrences: Fail
L: 10 occurrences: Pass
M: 8 occurrences: Fail
N: 9 occurrences: Fail
O: 9 occurrences: Fail
P: 9 occurrences: Fail
Q: 9 occurrences: Fail
R: 10 occurrences: Pass
S: 10 occurrences: Pass
T: 9 occurrences: Fail
U: 8 occurrences: Fail
V: 8 occurrences: Fail
W: 7 occurrences: Fail
X: 9 occurrences: Fail
Y: 9 occurrences: Fail
Z: 8 occurrences: Fail
Does the result contain correctly sequenced individual messages?
Each sequence 52 chars; 0A0B0C... 1A1B1C.... etc.
Message 0: FAIL - Expected 52 / Got 232 characters
Message 1: FAIL - Expected 52 / Got 2 characters
Message 2: FAIL - Expected 52 / Got 2 characters
Message 3: FAIL - Expected 52 / Got 2 characters
Message 4: FAIL - Expected 52 / Got 2 characters
Message 5: FAIL - Expected 52 / Got 222 characters
Program finished

As you can see, some of it works out… Most of it is a mess!

So what happened?

The string that will contain the result was created outside of the parallel tasks. Inside the tasks the result was updated without any synchronisation structure in place. That means that all the tasks could update the intermediate stages of the result and in the process overwrite each others changes, insert items out of sequence and so on.

I’ve written about some of ways that the Parallel Extensions can help with synchronisation of data across parallel tasks before (e.g. the ConcurrentDictionary being used to help with the aggregation of grouped counts) so perhaps here is an example of where another of the concurrent collections may come in handy. A ConcurrentBag could be used to hold each of the individual completed messages.

A ConcurrentBag is an unordered collection of objects that you can access across multiple threads in a safe way. You can add the same object to the bag as many times as you like. As it is unordered you cannot rely on the objects being retrieved in any particular sequence.

The code that builds the messages now looks like this:

private static string ConcurrentBagExample()
{
    ConcurrentBag<string> bag = new ConcurrentBag<string>();

    Parallel.For(0, 10,
                    (i) =>
                    {
                        string result = string.Empty;
                        for (char c = 'A'; c <= 'Z'; c++)
                        {
                            result += string.Format("{0}{1}", i, c);
                            Thread.Sleep(15);
                        }
                        bag.Add(result);
                    });

    return string.Join(Environment.NewLine, bag);
}

What has changed is that the building of the string has moved inside the task. This means that the task can only build the string for itself. Once it is done the string is added to the ConcurrentBag. The final string is built outside the parallel tasks. At the end of the method a simple string.Join() is used to pull all the data that’s been built up in the ConcurrentBag.

And now the messages are formed correctly. The only difference between the output of this program and that of the sequential reference program [see above] is the ordering of the individual messages:

1A1B1C1D1E1F1G1H1I1J1K1L1M1N1O1P1Q1R1S1T1U1V1W1X1Y1Z
0A0B0C0D0E0F0G0H0I0J0K0L0M0N0O0P0Q0R0S0T0U0V0W0X0Y0Z
3A3B3C3D3E3F3G3H3I3J3K3L3M3N3O3P3Q3R3S3T3U3V3W3X3Y3Z
2A2B2C2D2E2F2G2H2I2J2K2L2M2N2O2P2Q2R2S2T2U2V2W2X2Y2Z
5A5B5C5D5E5F5G5H5I5J5K5L5M5N5O5P5Q5R5S5T5U5V5W5X5Y5Z
4A4B4C4D4E4F4G4H4I4J4K4L4M4N4O4P4Q4R4S4T4U4V4W4X4Y4Z
7A7B7C7D7E7F7G7H7I7J7K7L7M7N7O7P7Q7R7S7T7U7V7W7X7Y7Z
6A6B6C6D6E6F6G6H6I6J6K6L6M6N6O6P6Q6R6S6T6U6V6W6X6Y6Z
9A9B9C9D9E9F9G9H9I9J9K9L9M9N9O9P9Q9R9S9T9U9V9W9X9Y9Z
8A8B8C8D8E8F8G8H8I8J8K8L8M8N8O8P8Q8R8S8T8U8V8W8X8Y8Z
Follow

Get every new post delivered to your Inbox.