LINQ: FirstOrDefault without the null check afterwards.

So, I was considering a problem I had that called for a LINQ statement that contains a FirstOrDefault() call. If there was an object returned I wanted a string property from it. If not, then I wanted to use a string.Empty.

I've often just caputured the result of the LINQ statement, checked for null, and if I had an object I'd call the property.

e.g.

var number = numbers.FirstOrDefault();
string firstNumberCaption = string.Empty;
if (number != null)
    firstNumberCaption = number.Caption;
Console.WriteLine("The first number is {0}", firstNumberCaption);

However, that code is a bit convoluted, and it occurred to me that there would be a better way of doing this.

The pure LINQ way

What if I called Select before the FirstOrDefault() to get the value of the property that I wanted. That way I don't have to worry about implementing my check.

var firstNumber = numbers
    .Select(n => n.Caption)
    .FirstOrDefault();

And if I am desperate for the resulting string to be the empty string I can always append ?? string.Empty onto the end, like this:

var firstNumber = numbers
    .Select(n => n.Caption)
    .FirstOrDefault() ?? string.Empty;

If you are not familiar with the ?? (null-coalescing) operator you can find out more on MSDN.

Optimisation

My next concern was that perhaps it will convert every instance in the enumerable that was passed to LINQ before discarding them all but the first one. However, that doesn't happen. For example, the following code only outputs one thing:

var firstNumber = numbers
    .Select(n =>
    {
        Console.WriteLine("Select {0} / {1}", n.Value, n.Caption);
        return n.Caption;
    })
    .FirstOrDefault() ?? string.Empty;

Filtering

I often put filters in my calls to FirstOrDefault(), and most of the time it is a filter on something other than what I'm returning in the Select part of the statement. Obviously, if we are performing the Select first, then the filter is not going to work if it is looking for data that is no longer there. In this case we just insert a Where statement just before the select to ensure that the filtering does happen.

So, for example, the first number in the sequence greater than 2:

var firstNumberCaption = numbers
    .Where(n => n.Value > 2)
    .Select(n => n.Caption)
    .FirstOrDefault() ?? string.Empty;

Console.WriteLine("The first number is {0}", firstNumberCaption);

Full program

Here is the full program (using the last example) if you want to see everything that is going on.

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;

namespace ConsoleApplication2
{
    class Number
    {
        public Number(int value, string caption)
        {
            Value = value;
            Caption = caption;
        }

        public int Value { get; set; }
        public string Caption { get; set; }
    }

    class Program
    {
        static void Main(string[] args)
        {
            Number[] numbers =
                {
                    new Number(1, "One"),
                    new Number(2, "Two"),
                    new Number(3, "Three")
                };

            var firstNumberCaption = numbers
                .Where(n => n.Value > 2)
                .Select(n => n.Caption)
                .FirstOrDefault() ?? string.Empty;

            Console.WriteLine("The first number is {0}", firstNumberCaption);

            Console.ReadLine();
        }
    }
}

First(OrDefault) Vs. Single(OrDefault)

There are two mechanisms (each with an …OrDefault variant) in LINQ for getting one item out of an enumeration. They are First and Single. There is a difference between the two and you can produce code that functions incorrectly if the wrong one is used.

So, what’s the main difference? They both sound like they’ll return just one item out from the enumeration. And, indeed, they do.

First will return the first item that it encounters that matches the predicate (if supplied). Whereas Single will return the one and only item that it encounters that matches the predicate (if supplied). If Single encounters a second item that matches the predicate then it throws an exception. If no predicate is supplied, it throws an exception simply if the enumeration has more that one item.

Why would there be two things that do almost the same thing that are so subtly different? First exists so that you can get the first item regardless of how many items there may actually be. Single exists to get you the one and only item. Single is useful when your predicate operates on a primary key. For example:

data.Single(d => d.PrimaryKey == idToMatch)

The …OrDefault variants will return the default value for the type (for reference types that will be null) if there are no matches found. Otherwise, both First and Single throw an exception if no items are encountered.

Lets look at some code.

First

string[] data = new[]{"Zero", "One", "Two", "Three",
    "Four", "Five", "Six", "Seven", "Eight", "Nine", "Ten"};
var first = data.First();

In this case, first will contain the value of "Zero".

If a predicate is added to the call to First then we can see what happens if there is no match.

string[] data = new[]{"Zero", "One", "Two", "Three",
    "Four", "Five", "Six", "Seven", "Eight", "Nine", "Ten"};
var first = data.First(x => x.Length > 10);

In this case, there are no matches, and an InvalidOperationException is thrown with the message "Sequence contains no matching element"

The same thing will happen if the initial set of data is empty

string[] empty = new string[0];
var first = empty.First();

You can happily supply a predicate that may match more than one item in the enumeration

Single

For example

string[] onlyOneItem = new string[]{"Only item"};
var single = onlyOneItem.Single();

This will return the one and only item that matches.

string[] data = new[]{"Zero", "One", "Two", "Three",
    "Four", "Five", "Six", "Seven", "Eight", "Nine", "Ten"};
var single = data.Single();

This will thrown an exception. If result set contains more than one item an InvalidOpertationException will be thrown with a message of "Sequence contains more than one element"

string[] empty = new string[0];
var single = empty.Single();

This will throw exactly the same exception as it's First counterpart; an InvalidOperationException is thrown with the message "Sequence contains no matching element"

…OrDefault

This is where things get a little bit more interesting. This says that if the result set contains zero items null (for reference types) is returned. In the case of First, the result set can contain zero, one or many items and it won’t throw an exception. In the case of Single only result sets containing zero or one item will return while any more will result in an exception.

So… what about this scenario:

string[] data = new[]{null, "Zero", "One", "Two", "Three",
    "Four", "Five", "Six", "Seven", "Eight", "Nine", "Ten"};
var first = data.FirstOrDefault();

The first value of the set is genuinely null. How do you tell the difference between that and the result set being simply empty without throwing an exception?

You could just go back to using the First variant and catching the exception. Or you could (if your result set can be enumerated many times without issue, e.g. the underlying object is an Array or List) use Any to test if the set contains any data in advance. Like this:

string[] data = new[]{null, "Zero", "One", "Two", "Three",
    "Four", "Five", "Six", "Seven", "Eight", "Nine", "Ten"};
if (data.Any())
{
    var first = data.FirstOrDefault();
    // Do stuff with the value
}

LINQ query performance

A while ago I was reviewing some code and I came across some code that looked like this

if (corpus.Where(a => a.SomeProperty == someValue).Count() > 0)
{
    // Do Stuff
}

And it got me thinking that it may not be the best way to do this. What is really being asked here is: “Are there any items in the enumerable?” The count is not actually important in this situation. I considered that it would probably be more efficient to write:

if (corpus.Where(a => a.SomeProperty == someValue).Any())
{
    // Do stuff
}

Then I read somewhere (unfortunately, I didn’t note the URL) that for certain situations the .Any() extension method on IEnumerable<T> can be inefficient in certain scenarios. For instance, if concrete type is actually a List<T> which maintains its own Count. In that instance the cost of setting up the Enumerator and calling MoveNext() to determine the existence of at least one element would be more expensive an operation than calling Count on the List<T>.

I was curious about that so I set about working out the relative performance characteristics of a number of the LINQ extension methods. I should note that these were all on LINQ to Objects out of the box so don’t measure how these methods would perform relatively for things like, say, LINQ to SQL.

I tested various scenarios, some where the IEnumerable<T> is a lightweight generator of elements, in this case an Enumerable.Range(…), in other cases I used a List<T> either by a concrete reference or by an reference to the IEnumerable<T> interface.

All timings in this post relate to my desktop machine which is running Windows 7 64bit with 8Gb RAM and an AMD Phenom II X4 955 running at 1.6GHz (which for some unknown reason it won’t run at the full 3.2GHz)

Counting elements

In the first set of tests I counted the number of elements. For the cases where I called the Count property directly on the List<T> and used the Count() extension method on IEnumerable<T> where the concrete type was the List<T> the result was returned in O(1). The LINQ method was 24 times slower.

Where the IEnumerable<T> did not also implement the ICollection<T> interface (as in the case where the values were being generated by Enumerable.Range(…) method) the Count() extension method took O(n) time to return the answer.

The graph above shows the number of Ticks (vertical axis) taken to complete the counting task with n (horizontal axis) elements. A tick is roughly 1/1600th of a millisecond. So for 2000000 elements it took 72.5ms to count them.

Compare that for instances where the Count property was called directly (0.00413 Ticks or 0.00258µs [millionths of a second]) or where the Count() method was called on something that could be cast to an ICollection or ICollection<T> (0.0989 Ticks or 0.0618µs)

So far it looks good for cases where the underlying type implements the ICollection<T> or ICollection interface. However remember as soon as you start filtering the data (e.g. with a Where() method call) then you are returning an IEnumerable<T> which then operates in O(n) time. Also remember that the Where() clause will add some overhead as it has to process the filter as well.

Any elements

It should be no surprise that using our test set of a List<T> and an Enumerable.Range(…) the Any() method runs in O(1) time. Both took similar amounts of time, the former taking 0.278 Ticks (0.174µs) per call, and the latter taking 0.296 Ticks (0.185µs) per call. I suspect that time on the latter is more due to the the small amount of additional time taken to generate additional elements as the enumerator progresses.

However, if you have a reference to something that already implements ICollection<T> which defines a Count property, such as a List<T>, you may find it is faster to perform (corpus.Count>0). I found that for the List<T> I’d created for the test runs it was only marginally slower than the raw call to Count taking 0.00607 ticks (0.00379µs) per call.

Any elements with filter

If you have a filter (a Where clause) then Any may take longer that O(1). It will take as long as it takes to find anything that matches the filter or O(n) if nothing matches the filter.

I ran three tests, one where the filtered condition was met on the first element, one where the condition was met in the middle of the set and one where the condition was not met until the last element.

Summary

If you have a concrete type the performance is better when using the Count property both for cases when you need to know the number of elements in the corpus or when you need to know if there any any elements at all.

If you simply need to know if there are any elements at all in the corpus then the use of Any() works out better than using the LINQ extension method Count() as Count() must traverse the entire corpus (unless it derives from ICollection<T> whereas Any() will short circuit at the first available opportunity.

Why should you be returning an IEnumerable

I’ve seen in many places where a method returns a List<T> (or IList<T>) when it appears that it may not actually really be required, or even desirable when all things are considered.

A List is mutable, you can change the state of the List. You can add things to the List, you can remove things from the List, you can change the items the List contains. This means that everything that has a reference to the List instantly sees the changes, either because an element has changed or elements have been added or removed. If you are working in a multi-threaded environment, which will be increasingly common as time goes on, you will get issues with thread safety if the List is used inside other threads and one or more threads starts changing the List.

Return values should, unless you have a specific use case in mind already, be returning an IEnumerable<T> which is not mutable. If the underlying type is still a List (or Array or any of a myriad of other types that implement IEnumerable<T>) you can still cast it. Also, some LINQ expressions will self optimise if the underlying type is one which better supports what LINQ is doing. (Remember that LINQ expressions always take an IEnumerable<T> or IQueryable<T> anyway so you can do what you like regardless of what the underlying type is).

If you ensure that your return values are IEnumerable<T> to begin with yet further down the line you realise you need to return an Array or List<T> from the method it is easy to start doing that. This is because everything accepting the return value from the method will still be expecting an IEnumerable<T> which List<T> and Array implement. If, however, you started with a List<T> and move to returning an IEnumerable<T> then because so much code will have the expectation of a List<T> without actually needing it you will have a lot of refactoring to do just to update the reference types.

Have I convinced you yet? If not, think about this. How often are you inserting items into a collection of objects after the initial creation routine? How often do you remove items from a collection after the initial creation routine? How often do you need to access a specific item by index within a collection after the initial creation routine? My guess is almost never. There are some occasions, but not actually that many.

It took me a while to get my head around always using an IEnumerable<T>, until I realised that I almost never require to do the things in the above paragraph. I almost always just need to loop over a collection of objects, or filter a collection of objects to produce a smaller set. Both of those things can be done with just an IEnumerable<T> and LINQ.

But, what if I need a count of the objects in the List<T>, that would be inefficient with an IEnumerable<T> and LINQ? Well, do you really need a count? Oftentimes I just need to know if there are any objects at all in the collection, I don't care how many object there actually are, in which case the LINQ extension method Any() can be used. If you do need a count LINQ is clever enough to work out that the underlying type may expose a Count property and it calls that (anything that implements ICollection<T> such as arrays, lists, dictionaries, sets, and so on) so it is not iterating over all the objects counting them up each time.

Remember, there is nothing necessarily wrong with putting a ToArray() to ToList() before returning as a reference to an IEnumerable<T> something to which a LINQ expression has been applied. That removes the issues that deferred execution can bring (e.g. unexpected surprises when it suddenly evaluates during the first iteration but breaks in the process) or repeatedly applying the filter in the Where() method or the transformation in the Select() method.

Just because an object is of a specific type, doesn't mean you have to return that specific type.

For example, consider the services you actually need on the collection that you are returning, remembering how much LINQ gives you. The following diagram shows what each of the interfaces expect to be implemented what a number of the common collection types implement themselves.

Incidentally, the reason some of the interfaces on the Array class are in a different colour is that these interfaces are added by the runtime. So if you have a string[] it will expose IEnumerable<string>.

I'd suggest that as a general rule IEnumerable<T> should be the return type when you have anything that implements it as the return type from the method, unless something from an ICollection<T> or IList<T> (or any other type of collection) as absolutely desperately in needed and not just because some existing code expects, say, an IList<T> (even although it is using no more services from it that it would had it been an IEnumerable<T>).

The mutability that implementations of ICollection<T> and IList<T> give will prove problematic in the long term. If you have a large team with members that don’t fully understand what is going on (and this is quite common given the general level developer documentation) they are likely to change the contents of the collection without understanding its implications. In some situations this may fine, in others it may be disastrous.

Finally, if you absolutely do need to return a more specific collection type then instead of returning a reference to the concrete class, return a reference to the lowest interface that you need. For example, if you have a List<T> and you need to add further items to it, but not at specific locations in the list, then ICollection<T> will be the most suitable return type.

Parallelisation Talk Examples – Basic PLINQ

These are some code examples from my introductory talk on Parallelisation showing the difference between a standard sequential LINQ query and its parallel equivalent.

The main differences between this and the previous two examples (Parallel.For and Parallel.ForEach) is that LINQ (and PLINQ) is designed to return data back, so the LINQ expression uses a Func<TResult, T1, T2, T3…> instead of an Action<T1, T2, T3…>. Since the examples were simply outputting a string to the Console to indicate which item or index was being processed I’ve changed the code to return a string back to the LINQ expression. The results are then looped over and output to the console.

It is also important to remember that LINQ expressions are not evaluated until the data is called for. In the example below that is with the .ToList() method call, however it may also be as a result of foreach or any other method of iterating over the expression results.

Code example 1: Sequential processing of data with LINQ

class Program
{
    private static Random rnd = new Random();

    static void Main(string[] args)
    {
        DateTime start = DateTime.UtcNow;

        IEnumerable<int> items = Enumerable.Range(0, 20);

        var results = items
            .Select(ProcessItem)
            .ToList();

        results.ForEach(Console.WriteLine);

        DateTime end = DateTime.UtcNow;
        TimeSpan duration = end - start;

        Console.WriteLine("Finished. Took {0}", duration);

        Console.ReadLine();
    }

    private static string ProcessItem(int item)
    {
        // Simulate similar but slightly variable length processing
        int pause = rnd.Next(900, 1100);
        Thread.Sleep(pause);

        return string.Format("Result of item {0}", item);
    }
}

The output of the above code may look something like this:

Basic LINQ

As you can see this takes roughly of 20 seconds to process 20 items with each item taking about one second to process.

Code Example 2: Parallel processing of data with PLINQ

The AsParallel extension method can be found in the System.Linq namespace so no additional using statements are needed if you are already using LINQ.

class Program
{
    private static Random rnd = new Random();

    static void Main(string[] args)
    {
        DateTime start = DateTime.UtcNow;

        IEnumerable<int> items = Enumerable.Range(0, 20);

        var results = items.AsParallel()
            .Select(ProcessItem)
            .ToList();

        results.ForEach(Console.WriteLine);

        DateTime end = DateTime.UtcNow;
        TimeSpan duration = end - start;

        Console.WriteLine("Finished. Took {0}", duration);

        Console.ReadLine();
    }

    private static string ProcessItem(int item)
    {
        // Simulate similar but slightly variable length processing
        int pause = rnd.Next(900, 1100);
        Thread.Sleep(pause);

        return string.Format("Result of item {0}", item);
    }
}

The output of the above code may look something like this:

Basic PLINQ

The result of this code is that it takes roughly 5 second to process the 20 items. I have a 4 core processor so it would be in line with the expectation that the work is distributed across all 4 cores.

Building a tag cloud with LINQ

I have a set of blog posts that I’m representing as a List of BlogPost objects. A BlogPost is class I created that represents everything to do with a blog post. In it there is a list of all the categories (or tags) that a blog post has.

SelectMany

If I want to build a tag cloud based on all the categories then I first need to know what the categories are. This is where a little bit of LINQ code such as this comes in handy:

List<BlogPost> posts = GetBlogPosts();
var categories = posts.SelectMany(p => p.Categories);

The SelectMany flattens out all the Category lists in the all the posts to produce one result that contains all the categories. So, lets say there are three blog posts with the following categories:

Post One Post Two Post Three
.NET .NET SQL Server
C# C# Stored Procedure
LINQ ADO.NET
SelectMany Stored Procedure

However, as it simply flattens the structure the end result is:

  • .NET
  • C#
  • LINQ
  • SelectMany
  • .NET
  • C#
  • ADO.NET
  • StoredProcedure
  • SQL Server
  • Stored Procedure

Distinct

If I simply want a list of all the categories, I could modify the code above to chain a Distinct call in.

List<BlogPost> posts = GetBlogPosts();
var categories = posts
    .SelectMany(p => p.Categories)
    .Distinct();

That results in a shorter list, like this:

  • .NET
  • C#
  • LINQ
  • SelectMany
  • ADO.NET
  • Stored Procedure
  • SQL Server

GroupBy

However, what is needed is each item with a count of the number of times it is repeated. This is where GroupBy comes in. Here’s the code:

List<BlogPost> posts = GetBlogPosts();
var categoryGroups = posts
    .SelectMany(p => p.Categories)
    .GroupBy(c => c);
 
foreach (var group in categoryGroups)
{
    // Do stuff with each group.
    // group.Key is the name of the category
}

The GroupBy clause (line 4) takes an expression that returns the thing being grouped by. Since the List contains strings representing the category, we will be grouping by itself, so the expression returns itself.

Since the categoryGroups is enumerable we can use the LINQ extension methods on it to find out how many times each category is mentioned by using the Count() extension method.

This means we can get a result like this:

  • .NET : 2 posts
  • C# : 2 posts
  • LINQ : 1 post
  • SelectMany : 1 post
  • ADO.NET :1 post
  • Stored Procedure : 2 posts
  • SQL Server : 1 posts

Tip of the Day #19: Create a list of objects instead of many lists of values

I?ve been reviewing some code and I came across something that jars. What is wrong with this is many-fold. Essentially, instead of encapsulating an related data into an entity that describes the whole the developer had created silos of data values, and you’d better hope that nothing went awry with any of it.

It looked like this:

List<string> addOnNames = new List<string>();
List<string> addOnDescriptions = new List<string>();
List<string> addOnCodes = new List<string>();
List<decimal> addOnBasePrice = new List<decimal>();

Okay, so if you don't yet find it jarring, here are some of the things that are wrong with this structure.

To iterate through and operate on a single "add on" you have to do something like this:

for(int i = 0; i < addOnNames.Count; i++)
{
    string name = addOnNames[i];
    string description = addOnDescriptions[i];
    string code = addOnCodes[i];
    decimal basePrice = addOnCode[i];
    // Do stuff that operates on an "Add On" here
}

That?s quite a bit of work just to get at the values you need in a particular loop iteration.

If you need to pass the lists on to methods, you have to pass each in its own parameter. Method signatures start to become needlessly large. In the simplified example above (yes, the real code had a lot more in it) there are three extra parameters that need to be passed around to each method call that needs to act on a collection of add-ons.

private void DoSomethingToACollectionOfAddOns(
                     List<string> addOnNames,
                      List<string> addOnDescriptions,
                      List<string> addOnCodes,
                     List<decimal> addOnBasePrice)
{
    // Do something.
}

It is also quite difficult to use LINQ to query what is effectively an AddOn entity.

Unless you are very careful, your lists can become out of sync, and when that happens all sorts of strange and difficult to trace bugs enter into the system. In the code I was reviewing there was an edge case where by one of the lists didn?t get updated when it was initially populated. Because all the lists are assumed to be the same length the first time that they were iterated over, what should have been the final entity couldn?t be retrieved on one of the lists because it didn?t exist. As this happened well after the creation of the lists and in a method called several levels deep it was quite a job working out where the original bug was coming from.

Solution

What needs to happen here is that an entity class is created for an add on. Each of these lists indicates a field or property in an entity class. The class might simply look something like this:

public class AddOn{
    public string Name { get; set; }
    public string Description { get; set; }
    public string Code { get; set; }
    public decimal BasePrice { get; set; }
}

This encapsulates everything about a single add on entity in one place. If you want a collection of these objects you can do something like this:

    List<AddOn> addOns = new List<AddOn>();

If you want to loop over them you don?t have to write lots of code to get all the elements out of a variety of lists a simple foreach will suffice (unless you need to also know the index)

    foreach(AddOn addOn in addOns){}

If you need to pass the entity around, or a collection of the entities around then you only need to pass one parameter into a method.

    private void DoSomethingToACollectionOfAddOns(List<AddOn> addOns) {}

Using LINQ becomes much easier because now you have everything encapsulated in the one place. I can?t even imagine the convoluted joins that would be needed to process the individual values in a LINQ expression otherwise.

When creating the initial collection, if any particular property is not needed then it can be simply ignored, if need be a default value can be set in the constructor. Then you never have an issue with one collection being out of sync with another. You no longer have to worry about synchronising collections, everything to do with a single instance of the entity is in one place.

Monitoring change in XML data (LINQ to XML series – Part 5)

This is the 5th part in a series on LINQ to XML. In this instalment we will look at monitoring changes in XML data in the XML classes added to .NET 3.5.

The XObject class (from which XElement and XAttribute, among others) contains two events that are of interest to anyone wanting to know about changes to the XML data: Changing and Changed

The Changing event is triggered prior to a change being applied to the XML data. The Changed event is triggered after the change has been applied.

An example of adding the event handler would be something like this:

XElement root = new XElement("root");
root.Changed += new EventHandler<XObjectChangeEventArgs>(root_Changed);

The above example will trigger for any change that happens in the node the event handler is applied to and any node downstream of it. As the example is applied to the root node this means the event will trigger for any change in the XML data.

The event handler is supplied an XObjectChangeEventArgs object which contains an ObjectChange property. This is an XObjectChange enum and it lets the code know what type of change happened.

The sender contains the actual object in the XML data that has changed.

Adding an element

Take the following example where an element is added to the XML data.

XElement child = new XElement("ChildElement", "Original Value");
root.Add(child);

In this case the ObjectChanged is Add and the sender is the XElement: <ChildElement>Original Value</ChildElement>

A similar scenario happens when adding an attribute. However, instead of the sender being an XElement it will be an XAttribute.

child.Add(new XAttribute("TheAttribute", "Some Value"));

Changing an element value

If the value of the element is changed (the bit that currently says "Original Value") then we don't get one event fired. We get two events fired. For example:

child.Value = "New Value";

The first event with ObjectChanged set to Remove and the sender set to "Orginal Value" (which is actually an XText object) and the second event with the ObjectChanged set to Add and the sender set to "New Value" (again, this is actually an XText object).

Changing an element name

If the name of the element is changed then the ObjectChanged property will be set to Name and the sender will be the XElement that has changed.

child.Name = "JustTheChild";

Changing an attribute name

Unlike changing an element value, when the value of an attribute changes the ObjectChanged property will be Value and the sender will be the XAttribute.

child.Attribute("TheAttribute").Value = "New Attribute Value";

Technorati Tags: ,,,

Navigating XML (LINQ to XML series – part 4)

In my last few posts on LINQ to XML (part 1, part 2 and part 3) I’ve shown you a starter on navigating around XML data. In this post I’ll continue to show you how to navigate through XML data by showing you how to navigate around sibling elements.

First consider this code:

XElement root = new XElement("root",
    new XElement("FirstChild"),
    new XElement("SecondChild"),
    new XElement("ThirdChild"),
    new XElement("FouthChild"),
    new XElement("FifthChild"));

Which produces the following XML structure:

<root>

<FirstChild />

<SecondChild />

<ThirdChild />

<FouthChild />

<FifthChild />

</root>

We can access the ThirdChild with this code:

XElement child = root.Element("ThirdChild");

From that point, we can also get access to its siblings.

To access the siblings that occur before the element we have a reference to then we can use ElementsBeforeSelf. As with Elements this returns an IEnumerable<XElement> object which allows us to iterate over the result, like this:

IEnumerable<XElement> elements = child.ElementsBeforeSelf();

foreach (XElement element in elements)
    Console.WriteLine(element);

The result is:

<FirstChild />

<SecondChild />

Conversely, we can get the siblings that come after the element we have a reference to with ElementsAfterSelf. Like this:

IEnumerable<XElement> elements = child.ElementsAfterSelf();

The result in this case will be:

<FouthChild />

<FifthChild />

Technorati Tags: ,,,

Navigating XML (LINQ to XML series – Part 3)

In my last two posts (part 1 and part 2) I’ve been introducing you to the the new XML classes in .NET 3.5. In this post I’ll continue that and show you some of the ways to navigate through XML.

First of all, lets start with a simple hierarchy of XML elements:

XElement root = new XElement("FirstGeneration",
                    new XElement("SecondGeneration",
                        new XElement("ThirdGeneration",
                            new XElement("FourthGeneration"))));

Which looks like this when rendered as XML:

<FirstGeneration>

<SecondGeneration>

<ThirdGeneration>

<FourthGeneration />

</ThirdGeneration>

</SecondGeneration>

</FirstGeneration>

Also in the last post I used Element to get a specific element from the current element. For example:

XElement child = root.Element("SecondGeneration");

Elements

If root (or FirstGeneration) only had one child element called "SecondGeneration" then everything is fine, you get what you asked for. However, if it contains multiple children all called "SecondGeneration" then you will only get the first element called "SecondGeneration".

For example, if you add the following to the code above:

root.Add(new XElement("SecondGeneration","2"));
root.Add(new XElement("SecondGeneration","3"));

You will get a piece of XML that looks like this:

<FirstGeneration>

<SecondGeneration>

<ThirdGeneration>

<FourthGeneration />

</ThirdGeneration>

</SecondGeneration>

<SecondGeneration>2</SecondGeneration>

<SecondGeneration>3</SecondGeneration>

</FirstGeneration>

If you want to get all those additional children called "SecondGeneration" you will need to use the Elements (note the plural) method. For example:

IEnumerable<XElement> children = root.Elements("SecondGeneration");

You'll also note that we don't get a collection returned but an enumerable. This give us the opportunity to exploit many of the new extension methods. But I'll leave them for another post. For the moment, we just need to know that it make it easy for us to enumerate over the data using a foreach loop. For example:

foreach (XElement child in children)
{
    Console.WriteLine(child);
    Console.WriteLine(new string('-', 50));
}

This will write out:

<SecondGeneration>

<ThirdGeneration>

<FourthGeneration />

</ThirdGeneration>

</SecondGeneration>

--------------------------------------------------

<SecondGeneration>2</SecondGeneration>

--------------------------------------------------

<SecondGeneration>3</SecondGeneration>

--------------------------------------------------

Parent

Using the same root object as above, we can see how to navigate back up the XML tree using Parent.

XElement grandchild = root.Element("SecondGeneration").Element("ThirdGeneration");
Console.WriteLine(grandchild.Parent);

The result of the code will be that the SecondGeneration element is printed.

 

Technorati Tags: ,,,
Follow

Get every new post delivered to your Inbox.

Join 25 other followers