LINQ Deferred Execution & Lambda Methods for providing Simple Stats (Part II)

This is part 2 in a series of posts on Linq & Lambda capabilities in C# 

Deferred Execution

So lets take a minute to talk about deferred execution. You may here this referred to as Lazy Execution as well. But in a nutshell what this means is that when you write a linq or lambda query against a collection or list, the execution of that query doesn’t actually happen until the point where you need to access the resuts. Let’s look at a simple example.

var ienum = Enumerable.Range(1, 10).ToList();

var query = from i in ienum
            where i%2 == 0
            select i;

ienum.Add(20);
ienum.Add(30);

SuperConsole.WriteLine(query);
//prints 2, 4, 6, 8, 10, 20, 30

So why does it print out 20 and 30. This is deferred execution in practice. At the point where you write your query (var query) the query is not actually executed against your datasource (ienum). After the query is setup, more data is added to your data source, and the query is only actually executed at the point where the results need to be evaluated (SuperConsole.WriteLine)

This holds true in a number of other Linq Scenarios. In Linq-to-Sql or Linq-to-Entity Framework, execution of the Sql Query is only sent to the database at the point where you need to evaluate your queries. It’s important to understand this so that queries don’t go out of scope before being executed, so that un-executed queries aren’t inadvertently passed to other parts or layers in your application and so that you don’t end up introducing N+1 problems where you think your working on data in memory but in actual fact, your performing multiple executions over and over in a loop. If you do need to make your queries “Greedy” and force them to execute there and then, you can wrap them in parenthesis and immediately call .ToList() on them to force the execution.

Min, Max, Count & Average

Linq has a number of convenient built in methods for getting various numeric stats about the data your working on. Consider a collection of Movies which you want to Query.

public class Movie
{
    public string Title { get; set; }
    public double Rating { get; set; }
}

...

var movies = new List
    {
        new Movie() {Title = "Die Hard", Rating = 4.0},
        new Movie() {Title = "Commando", Rating = 5.0},
        new Movie() {Title = "Matrix Revolutions", Rating = 2.1}
    };

Console.WriteLine(movies.Min(m => m.Rating));
//prints 2.1

Console.WriteLine(movies.Max(m => m.Rating));
//prints 5

Console.WriteLine(movies.Average(m => m.Rating));
//prints 3.7

Console.WriteLine(movies.Count);
Console.WriteLine(movies.Count());
//prints 3

Min, Max and Average are all fairly straight forward, finding the Minimum, Maximum and Average movie rating values respectively. It’s worth mentioning with regards the Count implementations that there are different “versions” of the count implementation depending on the underlying data structure you are operating on. The Count property is a property of the List class are returns the current number of items in that collection. The Count() method is an extension method on the IEnumerable interface which can be executed on any IEnumerable structure regardless of implementation.

In general LINQ’s Count will be slower and is an O(N) operation while List.Count and Array.Length are both guaranteed to be O(1). However in some cases LINQ will special case the IEnumerable parameter by casting to certain interface types such as IList or ICollection. It will then use that Count method to do an actual Count() operation. So it will go back down to O(1). But you still pay the minor overhead of the cast and interface call. Ref: [http://stackoverflow.com/questions/981254/is-the-linq-count-faster-or-slower-than-list-count-or-array-length/981283#981283]

This is important as well if you are testing your collections to see if they are empty. People coming from versions of .NET previous to Generics would use the Count or Length properties of a collection to see if they were empty. i.e.

if(list.Count == 0)
{ 
    //empty
}
if(array.Length == 0)
{
    //empty
}

Linq however provides another method to test for contents called Any(). It can be used to evaluate whether the collection is empty, or if the collection has any items which validate a specific filter.

if(list.Any()) //equivalent of count == 0
{ 
    //empty
}
if(list.Any(m => m.Rating == 5.0)) //if it contains any top rated movies.
{
    //empty
}

If you are starting with something that has a .Length or .Count (such as ICollection, IList, List, etc) – then this will be the fastest option, since it doesn’t need to go through the GetEnumerator()/MoveNext()/Dispose() sequence required by Any() to check for a non-empty IEnumerable sequence. For just IEnumerable, then Any() will generally be quicker, as it only has to look at one iteration. However, note that the LINQ-to-Objects implementation of Count() does check for ICollection (using .Count as an optimisation) – so if your underlying data-source is directly a list/collection, there won’t be a huge difference. Don’t ask me why it doesn’t use the non-generic ICollection… Of course, if you have used LINQ to filter it etc (Where etc), you will have an iterator-block based sequence, and so this ICollection optimisation is useless. In general with IEnumerable : stick with Any() Ref: [http://stackoverflow.com/questions/305092/which-method-performs-better-any-vs-count-0/305156#305156]

Next post, we’ll look at some different mechanisms for filtering and transforming our queries.

~Eoin C

Leave a Reply

Your email address will not be published. Required fields are marked *