Xander.PasswordValidator – WordListRegExBuilder

When word lists are processed a regular expression is built in order to quickly traverse the lists. The regular expression is built using a number of builders which create various parts of the regular expression. One for checking the password itself, and another for testing the password against the list but modified by adding a numeric suffix. You can add your own by creating your own class derived from WordListRegExBuilder and then adding it to the WordListProcessOptions.

The WordListRegExBuilder is an abstract base class and demands that the GetRegularExpression method is implemented in any concrete derived class. It also has a method called RegExEncode which takes a string and encodes it for use in a regular expression, escaping out all the special symbols used by the regular expression engine.

For example, say you want to validate the password against the list, but check also for a numeric prefix you can create a class to build that part of the regular expression. That class would look something like this:

using Xander.PasswordValidator;
namespace Xander.Demo.PasswordValidator.Web.Mvc3.Helpers
{
  public class NumericPrefixBuilder : WordListRegExBuilder
  {
    public override string GetRegularExpression(string password)
    {
      return "[0-9]" + RegExEncode(password);
    }
  }
}

And to use this in the validator, add it to the settings like this:

var settings = new PasswordValidationSettings();
settings.WordListProcessOptions.CustomBuilders.Add(typeof(NumericPrefixBuilder));

The Validator will create a new instance of your class and run the GetRegularExpression method. It will then incorporate that in to the regular expression that it is building and test word lists using it.

Xander.PasswordValidatator – Validation Handlers

If the password validator does not have the validation rules that you need for your project then it is easily extendable. You can create your own ValidationHander derived classes and set add them via the PasswordValdiationSettings class.

ValidationHandler

The ValidationHandler class is an abstract base class which is extended in the Xander.PasswordValidator framework itself to provide the various built in validation routines. (You can see examples of some of them here on GitHub)

You can create your own by simply creating a class and setting Xander.PasswordValidator.ValidationHandler as the base class and overriding the Validate() method.

The Validate method simply accepts the password as the parameter and returns a Boolean. Return true to indicate that the password passes the validation, false if it fails the validation.

For example, here is a very simple validator that rejects passwords that look like dates:

using System;
using Xander.PasswordValidator;

namespace Demo.ValidationHandlers
{
  public class NoDatesValidationHandler : ValidationHandler
  {
    public override bool Validate(string password)
    {
      DateTime date;
      var parseResult = DateTime.TryParse(password, out date);
      return !parseResult;
    }
  }
}

To set this up so that the validator calls it, it needs to be added as part of the settings. You pass in the type of the handler. The Validation framework will create an instance of the handler for you, if it needs it. If validation fails before it gets a chance to run your validator then your validator will not run.

var settings = new PasswordValidationSettings();
settings.CustomValidators.Add(typeof(NoDatesValidationHandler));

CustomValidationHandler<TData>

This derives from ValidationHandler and is used when you need to pass some additional data or objects into your validation handler for it to work properly.

The Validate method works as before, except you now have access to an additional property from the base class that contains your custom data, called CustomData. CustomData is the object passed in through the settings.

To pass in the data through the settings you use the CustomSettings property on the PasswordValidationSettings class. For example:

settings = new PasswordValidationSettings();
settings.MinimumPasswordLength = 6;
settings.CustomValidators.Add(typeof(PasswordHistoryValidationHandler));
settings.CustomSettings.Add(typeof(PasswordHistoryValidationHandler), new Repository());

The key passed into CustomSettings is a type that refers to the ValidationHandler type that the settings are to be sent to.

The ValidationHandler looks something like this:

using System.Linq;
using System.Web;
using Xander.PasswordValidator;

namespace Demo.ValidationHandlers
{
  public class PasswordHistoryValidationHandler : CustomValidationHandler
  {
    public override bool Validate(string password)
    {
      var user = HttpContext.Current.User;
      var history = CustomData.GetPasswordHistory(user.Identity.Name);
      return !history.Any(h => string.Compare(password, h, true) == 0);
    }
  }
}

In the above example, the settings pass in a repository which is passed on to the ValidationHandler when the Validator is run. The repository is used to get a history of passwords (It is a dummy repository in this example – In real life you should never have access to the plain text passwords like this) and the history can be checked against the current password to ensure that it does not match any of the historical passwords.

There is a slightly updated set of assemblies for this as I made some changes to way the CustomValidationHandler works: PasswordValidator.0.2.0.0.zip

Xander.PasswordValidator – In a Web Application

In the last post I introduced Xander.PasswordValidator and showed the basics of how to configure it. In this post I’m going to show the PasswordValidationAttribute and how you can use it in your ASP.NET MVC application.

PasswordValidation attribute

At its simplest, all you need to do is to decorate a property in your model with the PasswordValidationAttribute, like this:

  public class SomeModel
  {
     [PasswordValidation]
     public string Password { get; set; }






   
    // Other stuff goes here
  }

That will validate the password based on the settings in the config file, which I discussed briefly in my previous post, and I’ll go into more detail later.

Registering the Password Validator

In order for the file paths to custom word lists to be resolved correctly in a web application you need to register the validator in the Application_Start() method in your web application’s HttpApplication derived class. (Or anywhere before first use).

For example, the Application_Start() method may look like this:

protected void Application_Start()
{
   PasswordValidatorRegistration.Register(); // Register password validator
   AreaRegistration.RegisterAllAreas();
   RegisterGlobalFilters(GlobalFilters.Filters);
   RegisterRoutes(RouteTable.Routes); }

Validating settings from code

As the settings can get quite complex they cannot be set directly in the attribute that you use to decorate the model. Instead they can be set elsewhere and referenced in the attribute.

The settings can be configured as normal then added to the PasswordValidationSettingsCache. For example:

var settings = new PasswordValidationSettings();
settings.NeedsNumber = true;
settings.NeedsSymbol = true;
settings.MinimumPasswordLength = 6;
settings.StandardWordLists.Add(StandardWordList.FemaleNames);
settings.StandardWordLists.Add(StandardWordList.MaleNames);
settings.StandardWordLists.Add(StandardWordList.Surnames);
settings.StandardWordLists.Add(StandardWordList.MostCommon500Passwords);
PasswordValidationSettingsCache.Add("StandardRules", settings);

This code would typically be placed in the Application_Start() method, after registering the password validator.

The important line is the last one. It adds the setting tot he cache with the name “StandardRules”. That can then be references in the attribute later. Like this:

public class MyModel
{
  [PasswordValidation("StandardRules")]
  public string Password { get; set; } 
}

The PasswordValidationAttribute references the entry in the cache, which is then retrieved to perform the validation.

Xander.PasswordValidator – A Simple Demonstration

I recently introduced the Xander.PasswordValidator project I’m working on in a previous blog post. In this post I intend to demonstrate some of the basics of how to use it.

Validator

At the most core is the Validator class. It performs the validation of the password and returns a value to the caller to let them know if the validation passed or failed.

The validator can take settings set by the caller, or it can find settings in the application’s configuration file.

Here is a simple example of it working:

var settings = new PasswordValidationSettings();
settings.MinimumPasswordLength = 6;
settings.NeedsLetter = true;
settings.NeedsNumber = true;
settings.StandardWordLists.Add(StandardWordList.MostCommon500Passwords);
var validator = new Validator(settings);
bool result = validator.Validate("MySuperSecretPassword");

First off, a settings class is created, then various options are set. If you don’t set any options then the validator allows any password.

In this example the settings mandate the a password must be at least 6 characters, it must have a letter and it must have a number, and it must not appear in the built in list of the most common 500 passwords.

Then the Validator is created and passed the settings that we’ve prepared.

Finally, the Validate method is called passing in the password that is to be validated. The result indicates whether the password passed or failed (in the example above, it failed as it does not contain a number).

Settings from the config file

If you prefer to have the settings for the validator in the config file then you can instantiate a Validator without passing anything to its constructor and it will use the settings in the config file instead.

It should go without saying that you should only put the settings in the config file in a secure environment.

To use settings in the config file you must set up a the section where the settings will go, and then create the section with the settings in it.

To define the section:

<configSections>
<!-- Set up other config sections here—>
   <sectionGroup name="passwordValidation">
      <section name="rules" type="Xander.PasswordValidator.Config.PasswordValidationSection, Xander.PasswordValidator, Version=0.1.0.0, Culture=neutral, PublicKeyToken=fe72000dffcf195f" allowLocation="true" allowDefinition="Everywhere"/>
   </sectionGroup> </configSections>

An example of the config section itself:

<!-- The configuration section that describes the configuration for the password validation -->
<passwordValidation>
   <rules minimumPasswordLength="6" needsNumber="false" needsLetter="false" needsSymbol="false">
     <wordListProcessOptions checkForNumberSuffix="true" checkForDoubledUpWord="true" checkForReversedWord="true" />
     <standardWordLists>
       <add value="FemaleNames"/>
       <add value="MaleNames"/>
       <add value="MostCommon500Passwords"/>
       <add value="Surnames"/>
     </standardWordLists>
     <customWordLists>
       <add file="WordLists/MyCustomWordList.txt" />
       <add file="WordLists/MyOtherCustomWordList.txt" />
     </customWordLists>
   </rules> </passwordValidation>

The above example uses most options available out of the box that can be put in the config file. It is worth noting that some options are only available from the settings in the code, such as being able to specify custom classes that handle parts of the validation.

Have a play

If you want to try this out for yourself the two assemblies are available here. I will be putting this on NuGet soon.

Checking for NULL using Entity Framework

Here is a curious gotcha using the Entity Framework: If you are filtering on a value that may be null then you may not be getting back the results you expect.

For example, if you do something like this:

var result = context.GarmentsTryOns
    .Where(gto => gto.WeddingId == weddingId
                  && gto.PersonId == personId);

And personId is null then you won’t get any results back. This is because under the hood the query is structured like this:

…WHERE WeddingId = @p0 AND PersonId = @p1

That’s all great when @p1 has a value, but when it is null SQL Sever says nothing matches. In SQL Server, NULL is not a value, it is the absence of a value, it does not equal to anything (including itself) e.g. Try this:

SELECT CASE WHEN NULL = NULL THEN 1 ELSE 0 END

That returns 0!

Anyway, if you want to test NULL-ability, you need the IS operator, e.g.

SELECT CASE WHEN NULL IS NULL THEN 1 ELSE 0 END

That returns 1, which is what you’d expect.

Now, for whatever reason, EF is not clever enough to realise that in the above example, personId is (perfectly validly) null in some cases and switch from using = to IS as needed. So, what we need is a little jiggery-pokery to get this to work. EF can tell if you hard code the null, so you can do this in advance to set things up:

Expression<Func<GarmentTryOns, bool>> personExpression;
if (personId == null)
    personExpression = gto => gto.PersonId == null;
else
    personExpression = gto => gto.PersonId == personId;

This can then be injected as a Where filter onto the query and it EF will interpret it correctly. Like this:

var result = context.GarmentTryOns
                      .Where(gto => gto.WeddingId == weddingId)
                      .Where(personExpression);

The SQL that EF produces now correctly uses PersonId IS NULL when appropriate.

Handling bounces on Amazon SES

If you send to an email that does not exist, Amazon SES will perform some handling of the bounce before passing the details on to you.

When you send email through Amazon SES you may notice that the email arrives with a Return Path that looks something like this: 00000331b8b1d648-b8302192-701f-124d-a1d5-d268912677de-135246@email-bounces.amazonses.com

As it happens, the large delimited hex number before the @ sign is the same value that you got back from the SendEmail or SendRawMail response. (If you’re unfamiliar with sending an email see previous posts on SendEmail and SendRawEmail.)

// client is a AmazonSimpleEmailServiceClient
// request is a SendEmailRequest
SendEmailResponse response = client.SendEmail(request);
string messageId = response.SendEmailResult.MessageId;

When the email bounces, it will go first to Amazon SES where they will note which email bounced. Then the email will be forwarded on to you and you will receive the bounced email. (Be aware, tho’, that the email may end up in your spam folder – they did for me). Exactly where the bounce email will go depends on the API call you are using and the fields that you have populated in the outgoing email. The rules are detailed on the Bounce and Complaints notifications page of the Amazon SES Developer’s Guide.

If you look in the headers of this email you’ll see that Message Id again in various parts of the header. e.g.

X-Original-To: 00000331b8b1d648-b8302192-701f-124d-a1d5-d268912677de-135246@email-bounces.amazonses.com
Delivered-To: 00000331b8b1d648-b8302192-701f-124d-a1d5-d268912677de-135246@email-bounces.amazonses.com
Message-Id: <00000331b8b1d648-b8302192-701f-124d-a1d5-d268912677de-135246@email.amazonses.com>

How you process these bounces on your side is up to you. Amazon do not, yet (I’m hopeful they will and it has been requested a lot) provide an automated way of using the API for querying which emails are bouncing, are complained about or are rejected.

At present the best detail you are going to get on bounced emails is in the aggregate data provided through the GetSendStatistics API call or via the graphs on the AWS Console.

What happens if I send more email to an address that bounced?

If you continue to send emails to an address that bounces you will get a MessageRejectedException when you call SendEmail or SendRawEmail with the message “Address blacklisted.”

Conclusion on bounce handling

At present bounce handling using Amazon SES isn’t great (but it’s certainly no better than using a plain old SMTP service) however Amazon do appear to be interested in providing better support for handling bounces and the like. It may very well be better supported in the future.

Verifying Senders with Amazon SES

I’ve already written a couple of pieces about Amazon Simple Email Service (SES) on sending Email and sending emails with attachments.

Why do you have to verify senders?

It is important to note that while in development mode you have to verify all recipients and senders, in production mode you still have to verify the senders (this is, presumably, an anti-spam measure to ensure the high quality of email).

If you attempt to send an email from an email address that is not registered you will get a MessageRejectedException when you call SendEmail or SendRawEmail with the message “Email address is not verified”.

Listing and verifying senders

You can add and view senders in via AWS Console which is fine if all you need is to add the odd sender now and again. However, if your application is going to send on behalf of a number of people then you need a way to automate this.

The AWS API contains three methods that help with managing verified email addresses. You can VerifyEmailAddress, DeleteVerifiedEmailAddress and ListVerifiedEmailAddresses.

To Verify an email address

Here is the code to verify an email address

var config = new AmazonSimpleEmailServiceConfig();
var client = new AmazonSimpleEmailServiceClient(config);
VerifyEmailAddressRequest request = new VerifyEmailAddressRequest();
request.EmailAddress = "joe.bloggs@example.com";
var response = client.VerifyEmailAddress(request);

The an email will be sent to the email address listed

from        no-reply-aws@amazonaws.com via email-bounces.amazonses.com
to:         joe.bloggs@example.com
date:       13 November 2011 15:08
subject:    Amazon SES Address Verification Request
mailed-by:  email-bounces.amazonses.com

Dear Amazon SES customer:

We have received a request to authorize an email address for use with Amazon
SES.  To confirm that you are authorized to use this email address, please go
to the following URL:

https://email-verification.us-east-1.amazonaws.com/...........

Your request will not be processed unless you confirm the address using this
URL.

To learn more about sending email from Amazon SES, please refer to the Amazon
SES Developer Guide.

Sincerely, Amazon Web Services

Once you’ve clicked the link you’ll get a page with a message like this:

Congratulations!

You have successfully verified an email address with Amazon Simple Email Service. You can now begin sending email from this address.

If you are a new Amazon SES user and have not yet received production access to Amazon SES, then you can only send email to addresses that you have previously verified. To view your list of verified email addresses, go to the AWS Management Console or refer to the Amazon SES Developer Guide.

If you have already been approved for production access, then you can send email to any address.

Thank you for using Amazon SES.

Once this message has been displayed the email addresses will be displayed in the SES Console and you will be able to send email from this email address (in development mode it also means you will be able to send email to the address)

Listing the verified email addresses

In order to check the email addresses that have passed through the verification process you can use the method ListVerifiedEmailAddresses.

var config = new AmazonSimpleEmailServiceConfig();
var client = new AmazonSimpleEmailServiceClien(config);
var request = new ListVerifiedEmailAddressesRequest();
var response = client.ListVerifiedEmailAddresses(request);
var result = response.ListVerifiedEmailAddressesResult;
List<string> addresses = result.VerifiedEmailAddresses;

The addresses that have been successfully verified will be listed in the addresses list.

If the email goes out (from VerifyEmailAddress or from the AWS Console), and it the address is not yet verified then it won’t appear in the list.

Removing a verified email address

If you no longer need to send from an email address you can use the DeleteVerifiedEmailAddress method.

var config = new AmazonSimpleEmailServiceConfig();
var client = new AmazonSimpleEmailServiceClient(config);
var request = new DeleteVerifiedEmailAddressRequest();
request.EmailAddress = viewModel.NewEmailAddress;
var response = client.DeleteVerifiedEmailAddress(request);

Sending more than a basic email with Amazon SES

Previously, I wrote about getting started with Amazon’s Simple Email Service, and I included details of how to send a basic email. The SendEmail method is excellent at sending basic emails with HTML or Text bodies. However, it doesn’t handle attachments. For that, you need to use SendRawEmail.

SendRawEmail doesn’t give you much functionality. In fact, you have to do all the work to construct the email yourself. However, it does mean that you can do pretty much what you need with the email.

There are still some limitations. Amazon imposes a 50 recipient limit per email, a maximum 10Mb per email, and you can only add a small number of file types as an attachment. This is, I suspect, in order to reduce the ability for people to use the service to spam and infect other people while permitting most of all legitimate uses for the service.

Building an email

When I said that you have to do all the work to construct the email, I really did mean that. You have to figure out the headers, the way the multi-part MIME is put together the character encoding (because email is always sent using a 7-bit encoding) and so on.

I tried to do this, and it it was most frustrating work. The tiniest thing seemed to put Amazon SES into a sulk.

However, I did find a piece of code that someone else had written to do the heavy work for me. Essentially, what he’s doing is constructing a mail message using the built in System.Net.Mail.MailMessage type in .NET and then using .NET’s own classes to create the raw mail message as a MemoryStream, which is what Amazon SES wants.

I’ve refactored the code in the linked post so that it is slightly more efficient if you are calling it multiple times. It uses reflection, and some of the operations need only be carried out once regardless of the number of times you generate emails, so it removes those bits off to a static initialiser so that they only happen the once.

Here’s my refactored version of the code:

public class BuildRawMailHelper
{
    private const BindingFlags nonPublicInstance =
        BindingFlags.Instance | BindingFlags.NonPublic;

    private static readonly ConstructorInfo _mailWriterContructor;
    private static readonly MethodInfo _sendMethod;
    private static readonly MethodInfo _closeMethod;

    static BuildRawMailHelper()
    {
        Assembly systemAssembly = typeof(SmtpClient).Assembly;
        Type mailWriterType = systemAssembly
            .GetType("System.Net.Mail.MailWriter");

        _mailWriterContructor = mailWriterType
            .GetConstructor(nonPublicInstance, null,
                new[] { typeof(Stream) }, null);

        _sendMethod = typeof(MailMessage).GetMethod("Send",
            nonPublicInstance);

        _closeMethod = mailWriterType.GetMethod("Close",
            nonPublicInstance);
    }

    public static MemoryStream ConvertMailMessageToMemoryStream(
        MailMessage message)
    {
        using (MemoryStream memoryStream = new MemoryStream())
        {
            object mailWriter = _mailWriterContructor.Invoke(
                new object[] {memoryStream});

            _sendMethod.Invoke(message, nonPublicInstance, null,
                                new[] {mailWriter, true}, null);

            _closeMethod.Invoke(mailWriter, nonPublicInstance,
                null, new object[] {}, null);

            return memoryStream;
        }
    }
}

At first glance, the fact that the MemoryStream is disposed of does seem a bit counter-intuitive, however some methods of MemoryStream still function when the stream is closed, such as ToArray().

Incidentally, if you want to see what the raw email looks like you can use a piece of code like this to get the raw email as a string:

MemoryStream memoryStream =
    BuildRawMailHelper.ConvertMailMessageToMemoryStream(mailMessage);
byte[] data = rawMessage.Data.ToArray();
using (StreamReader reader = new StreamReader(new MemoryStream(data)))
{
    string rawMail = reader.ReadToEnd();
    Console.Write(rawEmail);
}

Using SendRawEmail

Because you’re doing all the work, the code that actually interacts with Amazon SES is very simple.

// mailMessage is an instance of a System.Net.Mail.MailMessage
var config = new AmazonSimpleEmailServiceConfig();
var client = new AmazonSimpleEmailServiceClient(config);
SendRawEmailRequest request = new SendRawEmailRequest();
request.RawMessage = new RawMessage();
request.RawMessage.Data = BuildRawMailHelper
    .ConvertMailMessageToMemoryStream(mailMessage);
var response = client.SendRawEmail(request);

And that’s it. You can now send emails with attachments, and anything else you can do with a MailMessage.

First(OrDefault) Vs. Single(OrDefault)

There are two mechanisms (each with an …OrDefault variant) in LINQ for getting one item out of an enumeration. They are First and Single. There is a difference between the two and you can produce code that functions incorrectly if the wrong one is used.

So, what’s the main difference? They both sound like they’ll return just one item out from the enumeration. And, indeed, they do.

First will return the first item that it encounters that matches the predicate (if supplied). Whereas Single will return the one and only item that it encounters that matches the predicate (if supplied). If Single encounters a second item that matches the predicate then it throws an exception. If no predicate is supplied, it throws an exception simply if the enumeration has more that one item.

Why would there be two things that do almost the same thing that are so subtly different? First exists so that you can get the first item regardless of how many items there may actually be. Single exists to get you the one and only item. Single is useful when your predicate operates on a primary key. For example:

data.Single(d => d.PrimaryKey == idToMatch)

The …OrDefault variants will return the default value for the type (for reference types that will be null) if there are no matches found. Otherwise, both First and Single throw an exception if no items are encountered.

Lets look at some code.

First

string[] data = new[]{"Zero", "One", "Two", "Three",
    "Four", "Five", "Six", "Seven", "Eight", "Nine", "Ten"};
var first = data.First();

In this case, first will contain the value of "Zero".

If a predicate is added to the call to First then we can see what happens if there is no match.

string[] data = new[]{"Zero", "One", "Two", "Three",
    "Four", "Five", "Six", "Seven", "Eight", "Nine", "Ten"};
var first = data.First(x => x.Length > 10);

In this case, there are no matches, and an InvalidOperationException is thrown with the message “Sequence contains no matching element”

The same thing will happen if the initial set of data is empty

string[] empty = new string[0];
var first = empty.First();

You can happily supply a predicate that may match more than one item in the enumeration

Single

For example

string[] onlyOneItem = new string[]{"Only item"};
var single = onlyOneItem.Single();

This will return the one and only item that matches.

string[] data = new[]{"Zero", "One", "Two", "Three",
    "Four", "Five", "Six", "Seven", "Eight", "Nine", "Ten"};
var single = data.Single();

This will thrown an exception. If result set contains more than one item an InvalidOpertationException will be thrown with a message of “Sequence contains more than one element”

string[] empty = new string[0];
var single = empty.Single();

This will throw exactly the same exception as it’s First counterpart; an InvalidOperationException is thrown with the message “Sequence contains no matching element”

…OrDefault

This is where things get a little bit more interesting. This says that if the result set contains zero items null (for reference types) is returned. In the case of First, the result set can contain zero, one or many items and it won’t throw an exception. In the case of Single only result sets containing zero or one item will return while any more will result in an exception.

So… what about this scenario:

string[] data = new[]{null, "Zero", "One", "Two", "Three",
    "Four", "Five", "Six", "Seven", "Eight", "Nine", "Ten"};
var first = data.FirstOrDefault();

The first value of the set is genuinely null. How do you tell the difference between that and the result set being simply empty without throwing an exception?

You could just go back to using the First variant and catching the exception. Or you could (if your result set can be enumerated many times without issue, e.g. the underlying object is an Array or List) use Any to test if the set contains any data in advance. Like this:

string[] data = new[]{null, "Zero", "One", "Two", "Three",
    "Four", "Five", "Six", "Seven", "Eight", "Nine", "Ten"};
if (data.Any())
{
    var first = data.FirstOrDefault();
    // Do stuff with the value
}

LINQ query performance

A while ago I was reviewing some code and I came across some code that looked like this

if (corpus.Where(a => a.SomeProperty == someValue).Count() > 0)
{
    // Do Stuff
}

And it got me thinking that it may not be the best way to do this. What is really being asked here is: “Are there any items in the enumerable?” The count is not actually important in this situation. I considered that it would probably be more efficient to write:

if (corpus.Where(a => a.SomeProperty == someValue).Any())
{
    // Do stuff
}

Then I read somewhere (unfortunately, I didn’t note the URL) that for certain situations the .Any() extension method on IEnumerable<T> can be inefficient in certain scenarios. For instance, if concrete type is actually a List<T> which maintains its own Count. In that instance the cost of setting up the Enumerator and calling MoveNext() to determine the existence of at least one element would be more expensive an operation than calling Count on the List<T>.

I was curious about that so I set about working out the relative performance characteristics of a number of the LINQ extension methods. I should note that these were all on LINQ to Objects out of the box so don’t measure how these methods would perform relatively for things like, say, LINQ to SQL.

I tested various scenarios, some where the IEnumerable<T> is a lightweight generator of elements, in this case an Enumerable.Range(…), in other cases I used a List<T> either by a concrete reference or by an reference to the IEnumerable<T> interface.

All timings in this post relate to my desktop machine which is running Windows 7 64bit with 8Gb RAM and an AMD Phenom II X4 955 running at 1.6GHz (which for some unknown reason it won’t run at the full 3.2GHz)

Counting elements

In the first set of tests I counted the number of elements. For the cases where I called the Count property directly on the List<T> and used the Count() extension method on IEnumerable<T> where the concrete type was the List<T> the result was returned in O(1). The LINQ method was 24 times slower.

Where the IEnumerable<T> did not also implement the ICollection<T> interface (as in the case where the values were being generated by Enumerable.Range(…) method) the Count() extension method took O(n) time to return the answer.

The graph above shows the number of Ticks (vertical axis) taken to complete the counting task with n (horizontal axis) elements. A tick is roughly 1/1600th of a millisecond. So for 2000000 elements it took 72.5ms to count them.

Compare that for instances where the Count property was called directly (0.00413 Ticks or 0.00258µs [millionths of a second]) or where the Count() method was called on something that could be cast to an ICollection or ICollection<T> (0.0989 Ticks or 0.0618µs)

So far it looks good for cases where the underlying type implements the ICollection<T> or ICollection interface. However remember as soon as you start filtering the data (e.g. with a Where() method call) then you are returning an IEnumerable<T> which then operates in O(n) time. Also remember that the Where() clause will add some overhead as it has to process the filter as well.

Any elements

It should be no surprise that using our test set of a List<T> and an Enumerable.Range(…) the Any() method runs in O(1) time. Both took similar amounts of time, the former taking 0.278 Ticks (0.174µs) per call, and the latter taking 0.296 Ticks (0.185µs) per call. I suspect that time on the latter is more due to the the small amount of additional time taken to generate additional elements as the enumerator progresses.

However, if you have a reference to something that already implements ICollection<T> which defines a Count property, such as a List<T>, you may find it is faster to perform (corpus.Count>0). I found that for the List<T> I’d created for the test runs it was only marginally slower than the raw call to Count taking 0.00607 ticks (0.00379µs) per call.

Any elements with filter

If you have a filter (a Where clause) then Any may take longer that O(1). It will take as long as it takes to find anything that matches the filter or O(n) if nothing matches the filter.

I ran three tests, one where the filtered condition was met on the first element, one where the condition was met in the middle of the set and one where the condition was not met until the last element.

Summary

If you have a concrete type the performance is better when using the Count property both for cases when you need to know the number of elements in the corpus or when you need to know if there any any elements at all.

If you simply need to know if there are any elements at all in the corpus then the use of Any() works out better than using the LINQ extension method Count() as Count() must traverse the entire corpus (unless it derives from ICollection<T> whereas Any() will short circuit at the first available opportunity.

Follow

Get every new post delivered to your Inbox.