Introduction

We recently started hitting some capacity issues with an SQL Server Reporting Services box hosted
on Microsoft's Azure Cloud Platform. The server had been setup around the time, Microsoft end-of-life'd
their platform-as-a-service report server offering, and forced everyone back onto standalone instances.
The server was a Basic A2 class VM (3.5GB Ram, 2 Cores). Originally, it only had to handle a small amount of report
creation load but in recent times, that load has gone up significantly. And due to the "peaky"
nature of the customer's usage, we would regularly see periods where the box could not keep up with
report generation requests.

In the past week, we've moved the customer to a new SQL Server 2014 Standard Edition install. Here are a few of the things we've
learned along the way with regards setting up SQL Server as a standalone instance on an Azure VM.

This information is based on the service offerings and availabilities in the Azure North Europe region as of February 2016

Which Virtual Machine Class?

First off, you should choose a DS scale virtual machine. At the time of writing, Microsoft offer 4 different VM classes
in the North Europe region: A, D, DS and D_V2. Only the DS class machines currently support Premium Locally
Redundant Storage (Premium LRS) which allows you to attach permanent SSD storage to your server.

Within the DS Set, DS1-DS4 have a slightly lower memory : core count ratio. The DS11-DS14 set have a higher
starting memory foot print for the same core count. We went with a DS3 server (4 core / 14GB) which we can downscale to
DS1 during out of hours periods.

Which Virtual Machine Class?

Which Storage Account?

During setup ensure that you’ve selected a Premium Locally Redundant Storage account which will
provide you access to additional attachable SSDs for your SQL Server. This can be found under
Optional Configuration > Storage > Create Storage Account > Pricing Tier

Which Storage Account?

External Security

Security will be somewhat dependent on your specific situation. In our case, this was a
standalone SQL Server with no failover cluster or domain management. The server was setup
with a long username and password (not the john.doe account in the screenshots).

We also lock down the management ports for Remote Desktop and Windows RM, as well as the added
HTTPS and SQL ports. To do this, add the public-to-private port mapping configurations under
Optional Configurations > Endpoints

Endpoint Configuration

Once you’ve finished setting up the configuration and azure has provisioned the server,
you’ll want to reenter the management blades and add ACL rules to lock down port access
to only the IP Ranges you want to access it. In our case, our development site, customer
site, and Azure hosted services.

You can add “permit” rules for specific IP addresses to access your server.
Once a single
permit rule is added, all other IP Addresses/Ranges are blocked by Default.

Endpoint ACLs

Automated Backups

SQL Azure VMs can now leverage an automated off-server database backup service
which will place your backups directly into Blob Storage. Select SQL Automated
Backup and enable it. You will be asked to specify where you would like to store your
backups and for how long. We chose to use a non-premium storage account
for this, and depending on the inherent value of your backups and whether you
intend to subsequently off-site them yourself, you might want to choose a storage
setup with zone or geo redundancy. You can also enable backup encryption by providing
a password here.

Automated SQL Backup to Storage

Disk Configuration

Now that your server is up and running, you can log in via Remote Desktop. The first
thing you’ll want to do is patch your server. As of mid-February 2016, the base image
for SQL Server 2014 on Windows Server 2012 Standard R2 is missing quite a number of
patches. Approximately ~70 critical updates and another ~80 optional updates need to
be installed.

Once you’ve got your server patched, you can take a look at the disk setup. If you’ve
chosen a DS Class Server, you’ll notice that you have 2 Disks. A regular OS disk, and
an SSD Temp Disk. This temp disk is NOT to be used for real data, it is local only to
the VM while it’s running and will be deallocated and purged if you shut the server
down

You can however purchase additional SSD disks very easily. Head back out to the Azure
Management Portal, find your VM, go to settings and choose Disks. In the following
screenshot, we've chosen to add an additional 2 x 128GB disks (P10 class) disks to
the server. The
SQL Server best practices
document recommends using the 1TB (P30 class) disks
which do give a significant I/O bump but they are also more expensive.

Ensure that you specify “Read Only” host caching for your Data Disk and No-Caching for
your Log disk to improve performance.

Adding Extra Disks

Once your disks are attached you can access and map them inside your VM. We chose to
setup the disks using the newer Window Server 2012 Resilient File System (ReFS) rather
than NTFS. Previously there were potential issues with using ReFS in conjunction with
SQL Server, particularly in relation to sparse files and the use of DBCC CHECKDB however
these issues have been resolved in SQL Server 2014.

Disk Configuration

Moving your Data Files

SQL Server VM Images come pre-installed with SQL Server so we’ll need to do a little bit
of reconfiguration to make sure all our data and log files end up in the correct place. In the
following sections, disk letters & paths refer to the following.

  • C: (OS Disk)
  • D:\SQLTEMP (Temp/Local SSD)
  • M:\DATA\ (Attached Perm SSD intended for Data)
  • L:\LOGS\ (Attached Perm SSD intended for Logs)

First, we need to give permission to SQL Server to access these other disks. Assuming
you haven’t changed the default service accounts, then your SQL Server instance will
be running as the NT SERVICE\MSSQLSERVER account. You’ll need to give this account Full
Permissions on each of the locations you intend to store data and log files.

Folder Permissions

Once the permissions are correct, we can specify those directories as new defaults
for our Data, Logs and Backups.

Setting Default Paths for Data & Logs

Next We’ll move our master MDF and LDF files, by performing the following steps.

  1. Launch the SQL Server configuration Manager
  2. Under SQL Server Services, select the main Server instance, and stop it
  3. Right click the server instance, go to properties and review the startup parameters tab
  4. Modify the –d and –e parameters to point to the paths where you intend to host your data and log files
  5. Open Explorer and navigate to the default directory where the MDF files and LDF files are located (C:\Program Files\Microsoft SQL Server\MSSQL12.MSSQLSERVER\MSSQL\). Move the Master MDF and LDF to your new paths
  6. Restart the Server

Moving the Master Database

When our server comes back online, we can move the remainder of the default databases.
Running the following series of SQL Commands will update the system to expect the MDFs
and LDFs and the new location on next start up.

ALTER DATABASE [msdb] MODIFY FILE ( NAME = MSDBData , FILENAME = 'M:\DATA\MSDBData.mdf' )
ALTER DATABASE [msdb] MODIFY FILE ( NAME = MSDBLog , FILENAME = 'L:\LOGS\MSDBLog.ldf' )
ALTER DATABASE [model] MODIFY FILE ( NAME = modeldev , FILENAME = 'M:\DATA\model.mdf' )
ALTER DATABASE [model] MODIFY FILE ( NAME = modellog , FILENAME = 'L:\LOGS\modellog.ldf' )
ALTER DATABASE [tempdb] MODIFY FILE (NAME = tempdev, FILENAME = 'D:\SQLTEMP\tempdb.mdf');
ALTER DATABASE [tempdb] MODIFY FILE (NAME = templog, FILENAME = 'D:\SQLTEMP\templog.ldf');

--You can verify them with this command
SELECT name, physical_name AS CurrentLocation, state_desc FROM sys.master_files 

Shut down the SQL Instance one more time. Phyiscally move your MDF and LDF files to
their new locations in Explorer, and finally restart the instance. If there are any
problems with the setup or the server fails to start, you can review the ERROR LOG in
C:\Program Files\Microsoft SQL Server\MSSQL12.MSSQLSERVER\MSSQL\Log\ERRORLOG

Conclusions


There are a number of other steps that you can then perform to tune your server.

You should also setup SSL/TLS for any exposed endpoints to the outside world
(e.g. if your going to run the server as an SSRS box). Hopefully you will have a you a far
more performant SQL Instance running in the Azure Cloud.

~Eoin Campbell

Introduction

Over the past few months Greenfinch has hired a number of new developers at varying levels of seniority. One of the go-to questions for our interview was to ask a potential candidate about SOLID principles in Object Oriented programming. Astonishingly, many candidates either didn't know what they were or only had an academic understanding of them and could not talk about them in a practical sense with regards to real projects that they'd worked on.

Because we have a number of development engineers spanning various levels of experience, we thought it would be appropriate to have a quick refresher course on SOLID with some practical examples in one of our lunchtime brown-bag sessions. You can find the presentation below.

Presentation

Object Oriented Programming Concepts

Inheritance

Inheritance is when an object or class is based on another object or class, using the same implementation (inheriting from a class) or specifying implementation to maintain the same behaviour (implementing an interface). It is a mechanism for code reuse and to allow independent extensions of the original software via public classes and interfaces giving rise to a hierarchy.

Inheritance should not be confused with sub-typing though they can agree with one another. In general sub-typing establishes an is-a relationship, while inheritance only reuses implementation and establishes a syntactic relationship, not necessarily a semantic relationship.

public class Vehicle { ... }

public class RoadVehicle : Vehicle { ... }

public class Car : RoadVehicle { ... }

public class Truck : RoadVehicle { ... }

Encapsulation

Encapsulation refers to the bundling of data with the methods that operate on that data. Encapsulation is used to hide the values or state of a structured data object inside a class, preventing unauthorized parties' direct access to them. Publicly accessible methods are generally provided in the class (so-called getters and setters) to access the values, and other client classes call these methods to retrieve and modify the values within the object.

It's important to understand that Encapsulation doesn't just mean classes are property bags with getters & setters and a handful of methods. As well as hiding implementations and exposing only the APIs required for the consumers to get what they need from a class or module, it also relates to how you structure your application architecture. Properly delineating your architecture into Core, Data, Service, Façade and Consumer layers, for example, will help encapsulate the functionality below from the callers above and keep your architecture decoupled.

Another important consideration with regards encapsulation is testability. Too often, we'll start with a correctly encapsulated piece of code, and then when it comes time to unit test we realise that the functionality we want to test is buried inside private/inaccessible methods. This should be a red-flag to you that you need to rethink your design. Rather than just making these methods public or slapping a InternalsVisibleTo attribute on your assemblies, consider that maybe you need to abstract that functionality out of your method into another responsible class or module.

Polymorphism

At run time, objects of a derived class may be treated as objects of a base class in places such as method parameters and collections or arrays. Base classes may define and implement virtual methods, and derived classes can override them, which means they provide their own definition and implementation. At run-time, when client code calls the method, the CLR looks up the run-time type of the object, and invokes that override of the virtual method. Thus in your source code you can call a method on a base class, and cause a derived class' version of the method to be executed.

public class Program
{
    static void Main(string[] args)
    {

        // Polymorphism at work #1: a Rectangle, Triangle and Circle
        // can all be used whereever a Shape is expected. No cast is
        // required because an implicit conversion exists from a derived
        // class to its base class.
        List shapes = new List();
        shapes.Add(new Rectangle());
        shapes.Add(new Triangle());
        shapes.Add(new Circle());

        // Polymorphism at work #2: the virtual method Draw is
        // invoked on each of the derived classes, not the base class.
        foreach (Shape s in shapes)
        {
            s.Draw();
        }

        // Keep the console open in debug mode.
        Console.WriteLine("Press any key to exit.");
        Console.ReadKey();
    }
}

public abstract class Shape
{
    public abstract void Draw();
}

public class Circle : Shape
{
    public override void Draw()
    {
        Console.WriteLine("Drawing a circle");
    }
}
public class Rectangle : Shape
{
    public override void Draw()
    {
        Console.WriteLine("Drawing a rectangle");
    }
}
public class Triangle : Shape
{
    public override void Draw()
    {
        Console.WriteLine("Drawing a triangle");
    }
}

Cohesion & Coupling

Cohesion and coupling are worth mentioning in unison. Cohesion refers to how closely related (logically/semantically) the pieces of functionality are, that are exposed by a particular module or class. If you ask yourself the question, "Do these pieces of functionality belong together?" and the answer is "Yes!" then you have a cohesive piece of code. Coupling on the other hand refers to how tightly interlinked two totally separate modules/classes are together. The more coupling that exists in your application, the more likely that changes to one piece of functionality will have an effect (possibly an adverse effect) on another.

In general you should aim to write code which is highly cohesive, with low coupling.

What's that smell?

There are a number of things that ring out to developers as wrong when they seem them in software: Duplicated code; long methods and long branching statements; unmaintainable/brittle tests; tomes of text within method comment blocks explaining the voodoo that lies before them. We typically refer to these as Code Smells but there are also architectural smells that often times go ignored. Rigid designs that are difficult to change and manipulate; Viscous & complex designs that require massive surgery to get the next square feature to fit in that round interface/inheritance hierarchy, fragile & immobile designs that break when we change them and result in developers having to cut corners or possibly throw DRY out the window. (don't repeat yourself - yes I realise the irony of spelling out the acronym)

But what's the big deal? So maybe we need to write a little more code or perform a little bit of surgery on the architecture. That's development right!

Well not really. At the end of the day, change equals cost. This is particularly relevant in a SME like Greenfinch where a number of our projects are bespoke engagements with customers. That cost needs to be absorbed somewhere so it's either going to cost our customers more to get the functionality required or Greenfinch needs to absorb those costs during development. It also has a negative impact on the Team. In projects with many developers where a colleague may have to extend work that you've done, you end up putting road blocks in place for them. Overall it impacts on development/product morale and soon people are grumbling about that module or that developers code. Probably worst of all is the build up of a business debt. Some refer to this as technical debt, but really, it's the business that owns the product that is accruing these //TODO items and //MUST FIX backlog tickets that seem to grow at a faster velocity than they can be cleared.

SOLID Principles

SOLID is an acronym for five guiding principles to help you write better, more maintainable code. Popularized by Robert C. Martin (aka Uncle Bob) in his book Agile Software Development: Principles, Patterns, and Practices, where he gave pragmatic advice on object-oriented design and development in an agile team. SOLID stands for:

  • Single Responsibility Principle (SRP)
  • Open-Closed Principle (OCP)
  • Liskov Substitution Principle (LSP)
  • Interface Segregation Principle (ISP)
  • Dependency Inversion Principle (DIP)

Software Development is not supposed to be like a game of Jenga. You shouldn't be worried about the entire system collapsing, every time you add, remove or refactor one of the blocks of the system. These 5 principles provide guidance on how best to construct your code & architecture to ensure that it's easily maintained and modified by you and your colleagues.

Single Responsibility Principle

If a class has more then one responsibility, then the responsibilities become coupled. Changes to one responsibility may impair or inhibit the class’ ability to meet the others. This kind of coupling leads to fragile designs that break in unexpected ways when changed. - Robert C. Martin

In a nutshell, each block of code & functionality (methods & classes) should be responsible for one single thing. The more things that a block of code is responsible for, the more heavily coupled it is with other pieces of functionality and behaviour, and as a result, the more likely it is to break, when you want to change just one small part of it. Let's consider a simple logging class for example.

public class EoinsLogger
{
    public enum LogTo
    {
        TheDatabase,
        TheFileSystem
    }

    public void LogMessage(string message, LogTo where)
    {
        if (where == LogTo.TheDatabase)
        {
            LogToTheDatabase(message);
        }
        else
        {
            LogToTheFileSystem(message);
        }
    }

    private void LogToTheDatabase(string message)
    {
        //ADO.NET Code

    }

    private void LogToTheFileSystem(string message)
    {
        // System.IO. Code

    }
}

This code has a lot of different responsibilities. It's responsible for logging obviously, but it's also responsible for the decision on which underlying logging implementation to use, as well as the two specific implementation methods themselves. If another developer wants to come along and modify this, perhaps adding a third logging medium, they need to significantly alter the class in order to accomplish that. Below is a slightly better implementation. We've abstracted the actual implementations of the specific logging medium from the logger itself. Now, we've taken away some of the responsibility from the logger.

public class Logger
{
	public enum LogTo
	{
		TheDatabase,
		TheFileSystem
	}

	private ILoggerImplementation _ilog;

	public Logger(LogTo where)
	{
		if(where == LogTo.TheDatabase) _ilog = new DatabaseLogger();
		else _ilog = new FileSystemLogger();
	}
	public void LogMessage(string message)
	{
		_ilog.LogMessage(message);
	}
}

public interface ILoggerImplementation
{
	void LogMessage(string message);
}

public class DatabaseLogger : ILoggerImplementation
{
	public void LogMessage(string message)
	{
		//ADO.NET Code
	}
}

public class FileSystemLogger : ILoggerImplementation
{
	public void LogMessage(string message)
	{
		//System.IO Code
	}
}

But it still has ownership of both the logging process, and the decision on where to log to. It breaks the OC Principle, as extending the logging to include a third implementation means modifying the brancing logic in the logger. It should be closed for extension.

Open-Closed Principle (OCP)

Modules that conform to open-closed have two primary attributes: They are “Open For Extension” They are “Closed for Modification” - Robert C. Martin

Let's further modify our previous logging example. It complies with our Open for Extension attribute, but it's not currently closed for modification. We can accomplish that by injecting the logger implementation to be used at runtime.

public class Logger
{
	private ILoggerImplementation _ilog;

	public Logger(ILoggerImplementation theLogger)
	{
		_ilog = theLogger;
	}
	public void LogMessage(string message)
	{
		_ilog.LogMessage(message);
	}
}

public interface ILoggerImplementation
{
	void LogMessage(string message);
}

public class DatabaseLogger : ILoggerImplementation
{
	public void LogMessage(string message)
	{
		//ADO.NET Code
	}
}

public class FileSystemLogger : ILoggerImplementation
{
	public void LogMessage(string message)
	{
		//System.IO Code
	}
}

That's much better now. Our logger class is simply responsible for logging. Extension can be accomplished by creating new loggers. And the decision on which medium to use has been removed (for the consumer to decide upon) when it instantiates the logger.

Liskov Substitution Principle (LSP)

Objects in a program should be replaceable with instances of their subtypes without altering the correctness of that program. - Some body that isn't Robert C. Martin

Deciding on the correct abstractions in your architecture is important to get right and well worth having the initial design sessions on. Whiteboard it out. Decide among your architecture teams if your object hierarchy is correct. Lets look at a simple/contrived example here. Everyone learns about shapes in primary school mathematics. A rectangle is a 4 sided shape. Each side seperated by a 90 degree angle. Resulting in two long sides (the length) and two short sides (the width). You can get the area of a rectangle by multiplying its length by its width. And you can double the area of the rectangle by doubling the length of one of its sides.

public class Rectangle
{
	protected int Length { get; private set; }

	protected int Width { get; private set; }

	public Rectangle(int l, int w)
	{
		Length = l;
		Width = w;
	}

	public virtual int GetArea()
	{
		return Length * Width;
	}

	public void DoubleInArea()
	{
		Length = Length * 2;
	}
}

[TestFixture]
public class RectangleTests
{
	[Test]
	public void TestRectangeArea()
	{
		int l = 10;
		int w = 5;
		int expected = 50;
		Rectangle r = new Rectangle(l, w);
		int actual = r.GetArea();
		Assert.AreEqual(expected, actual);

		r.DoubleInArea();
		int newexpected = 100;
		int newactual = r.GetArea();
		Assert.AreEqual(newexpected, newactual);
	}
}

We also learn that a square is just a more specialised type of rectangle where all 4 sides are equal in length to one another. So it seems pretty reasonable to design a system where a Square is just a specialised sub-type of Rectangle. Right ?

public class Square : Rectangle
{
	public Square(int side)	: base(side, side)
	{

	}

	public override int GetArea()
	{
		return Length * Length;
	}
}

[TestFixture]
public class SquareTests
{
	[Test]
	public void TestSqureArea()
	{
		int l = 10;
		int expected = 100;
		Rectangle r = new Square(l);
		int actual = r.GetArea();
		Assert.AreEqual(expected, actual);

		r.DoubleInArea();
		int newexpected = 200;
		int newactual = r.GetArea();
		Assert.AreEqual(newexpected, newactual);
	}
}

But wait, what's happened here. The implementer of Square has overridden the GetArea() method to multiply the length by itself. A perfectly reasonable assumption in the context of a square. But the underlying type has a DoubleInArea() method which doubles the length of the Square. Calling this method in conjunction with the Square's GetArea() method doesn't just double the length. It quadruples the area. This kinda of issue rears it's head all too often in Software Development where presumptuous but naive abstractions fail in real world use.

So what would a better solution have been here. Maybe both rectangle and square should have implemented an IFourSidedShape interface which forced the implementer of Square to explicity implement both the GetArea & DoubleInArea methods.

Remember if it walks like a duck, and quacks like a duck, but needs batteries, you probably have the wrong abstraction.

Interface Segregation Principle (ISP)

Classes that have fat interfaces are classes whose interfaces are not cohesive. In other words, the interfaces of the class can be broken up into groups of member functions. Each group serves a different set of clients. - Robert C. Martin

Here's a very simplified example of a FileSystemManager. It's singly responsible for all file I/O in our Application. It encapsulates and abstracts away the file I/O code. It's decoupled. It's cohesive in it's responsibilities.

public class FileSystemManager
{
	public void ReadFile() { }

	public void WriteFile() { }
}

Great, but it has a pretty fat interface. Reading AND Writing files. Perhaps not every module or service that consumes this code cares about reading and writing. A logging module might only care about writing to the file system. A configuration service might only care about reading the .config files off the disk. Interface segregation is about logically splitting up the functionality of your code into smaller more semantically and logically coherent APIs for the consumers that are going to use them. In the following example, we've broken the file system manager down to implement two seperate interfaces; an IFileReader and an IFileWriter. Consumers of this code can then treat the FileSystemManager as one or the other depending on their specific needs. Furthermore, new implementations (e.g. a BlobStorageSystemManager) need only implement the interfaces that it requires.

public interface IFileReader
{
	void ReadFile();
}

public interface IFileWriter
{
	void WriteFile();
}

public class ProperFileSystemManager : IFileReader, IFileWriter
{
	public void ReadFile() { }

	public void WriteFile() { }
}

public class ProperBlobStoargeManager : IFileReader, IFileWriter
{
	public void ReadFile() { }

	public void WriteFile() {}
}

Dependency Inversion Principle (DIP)

A design is rigid if it cannot be easily changed. Such rigidity is due to the fact that a single change to heavily interdependent software begins a cascade of changes in dependent modules. - Robert C. Martin

Dependency inversion relates to keeping our architecture decoupled. High-level modules should not depend on low-level modules. Instead both should depend on abstractions. Those abstractions should not depend on the details. Again the details should depend on abstractions. Lets look at a simple example of some hierarchical classes which have coupled dependencies on each other.

public class FacadeLayerManager
{
	private ServiceLayerManager _serviceLayerManager;

	public FacadeLayerManager ()
	{
	//Instantiate _serviceLayerManager
	}

	public List<object>GetData()
	{
		return _serviceLayerManager.GetData();
	}
}

public class ServiceLayerManager
{
	private DataManager _dataManager;
	
	public ServiceLayerManager ()	
	{
		//Instantiate _dataManager
	}
	
	public List<object>GetData()
	{
		return _dataManager.GetData();
	}
}

public class DataManager
{
	public List<object>GetData()
	{
		//Get Data From Database
	}
}

Here we have a simple 3-tier architecture where the facade layer makes calls to a service layer to get data, and the service layer makes calls to the lower module DataManager. But this is a tightly coupled architecture. We cannot test our ServerLayerManager without creating an instance of our DataManager connecting to our database. Our higher level modules depend on the lower level rather than on an abstraction.

Instead, we can replace the instance fields in each layer with an abstraction (Interface) and inject our specific implementation via the constructor.


#region Interfaces

public interface IBetterFacadeLayerManager
{
	List<object>GetData();
}

public interface IBetterServiceLayerManager
{
	List<object>GetData();
}
	
public interface IBetterDataManager
{
	List<object>GetData();
}

#endregion

#region Implementations
public class FacadeLayerManager : IBetterFacadeLayerManager
{
	private IServiceLayerManager _serviceLayerManager;

	public FacadeLayerManager (IServiceLayerManager injectedManager)
	{
		_serviceLayerManager = injectedManager;
	}

	public List<object>GetData()
	{
		return _serviceLayerManager.GetData();
	}
}

public class ServiceLayerManager
{
	private IBetterDataManager _dataManager;

	public ServiceLayerManager (IBetterDataManager injectedManager)
	{
		_dataManager = injectedManager
	}
	
	public List<object>GetData()
	{
		return _dataManager.GetData();
	}
}
		
public class DataManager : IBetterDataManager
{
	public List<object>GetData()
	{
		//Get Data From Database
	}
}

Now we've removed the tightly coupled dependencies at each layer. We can also take advantage of Dependency Injection tools and frameworks such as AutoFac, Ninject, Unity (or SpringDI in the Java world) to automatically inject the correct concrete implementation at run time based on some configuration settings.

Summary

OO Design is easy to get wrong. It's especially easy for a design to get out-of-control if you don't keep good principles and practices in mind at every step of development. Putting that first hack in there, or cutting that first corner is akin to thrown the pebble down the side of a snow topped mountain. It's going to turn into a very large snowball, very quickly, and once it does, it's going to be far more difficult to stop. So following good OO Principles, follow these SOLID principles, talk to one another during the design phase, think about and put the time into designing and writing your code to a high quality. It'll serve you, your company and your customers far better in the long run.

~Eoin Campbell

[su_note note_color="#dee3ab" text_color="#5E826F"]This is part 2 in a series of posts on Linq & Lambda capabilities in C# [/su_note]

Deferred Execution

So lets take a minute to talk about deferred execution. You may here this referred to as Lazy Execution as well. But in a nutshell what this means is that when you write a linq or lambda query against a collection or list, the execution of that query doesn't actually happen until the point where you need to access the resuts. Let's look at a simple example.

var ienum = Enumerable.Range(1, 10).ToList();

var query = from i in ienum
            where i%2 == 0
            select i;

ienum.Add(20);
ienum.Add(30);

SuperConsole.WriteLine(query);
//prints 2, 4, 6, 8, 10, 20, 30

So why does it print out 20 and 30. This is deferred execution in practice. At the point where you write your query (var query) the query is not actually executed against your datasource (ienum). After the query is setup, more data is added to your data source, and the query is only actually executed at the point where the results need to be evaluated (SuperConsole.WriteLine)

This holds true in a number of other Linq Scenarios. In Linq-to-Sql or Linq-to-Entity Framework, execution of the Sql Query is only sent to the database at the point where you need to evaluate your queries. It's important to understand this so that queries don't go out of scope before being executed, so that un-executed queries aren't inadvertently passed to other parts or layers in your application and so that you don't end up introducing N+1 problems where you think your working on data in memory but in actual fact, your performing multiple executions over and over in a loop. If you do need to make your queries "Greedy" and force them to execute there and then, you can wrap them in parenthesis and immediately call .ToList() on them to force the execution.

Min, Max, Count & Average

Linq has a number of convenient built in methods for getting various numeric stats about the data your working on. Consider a collection of Movies which you want to Query.

public class Movie
{
    public string Title { get; set; }
    public double Rating { get; set; }
}

...

var movies = new List
    {
        new Movie() {Title = "Die Hard", Rating = 4.0},
        new Movie() {Title = "Commando", Rating = 5.0},
        new Movie() {Title = "Matrix Revolutions", Rating = 2.1}
    };

Console.WriteLine(movies.Min(m =&gt; m.Rating));
//prints 2.1

Console.WriteLine(movies.Max(m =&gt; m.Rating));
//prints 5

Console.WriteLine(movies.Average(m =&gt; m.Rating));
//prints 3.7

Console.WriteLine(movies.Count);
Console.WriteLine(movies.Count());
//prints 3

Min, Max and Average are all fairly straight forward, finding the Minimum, Maximum and Average movie rating values respectively. It's worth mentioning with regards the Count implementations that there are different "versions" of the count implementation depending on the underlying data structure you are operating on. The Count property is a property of the List class are returns the current number of items in that collection. The Count() method is an extension method on the IEnumerable interface which can be executed on any IEnumerable structure regardless of implementation.

In general LINQ's Count will be slower and is an O(N) operation while List.Count and Array.Length are both guaranteed to be O(1). However in some cases LINQ will special case the IEnumerable parameter by casting to certain interface types such as IList or ICollection. It will then use that Count method to do an actual Count() operation. So it will go back down to O(1). But you still pay the minor overhead of the cast and interface call. Ref: [http://stackoverflow.com/questions/981254/is-the-linq-count-faster-or-slower-than-list-count-or-array-length/981283#981283]

This is important as well if you are testing your collections to see if they are empty. People coming from versions of .NET previous to Generics would use the Count or Length properties of a collection to see if they were empty. i.e.

if(list.Count == 0)
{ 
    //empty
}
if(array.Length == 0)
{
    //empty
}

Linq however provides another method to test for contents called Any(). It can be used to evaluate whether the collection is empty, or if the collection has any items which validate a specific filter.

if(list.Any()) //equivalent of count == 0
{ 
    //empty
}
if(list.Any(m => m.Rating == 5.0)) //if it contains any top rated movies.
{
    //empty
}

If you are starting with something that has a .Length or .Count (such as ICollection, IList, List, etc) - then this will be the fastest option, since it doesn't need to go through the GetEnumerator()/MoveNext()/Dispose() sequence required by Any() to check for a non-empty IEnumerable sequence. For just IEnumerable, then Any() will generally be quicker, as it only has to look at one iteration. However, note that the LINQ-to-Objects implementation of Count() does check for ICollection (using .Count as an optimisation) - so if your underlying data-source is directly a list/collection, there won't be a huge difference. Don't ask me why it doesn't use the non-generic ICollection... Of course, if you have used LINQ to filter it etc (Where etc), you will have an iterator-block based sequence, and so this ICollection optimisation is useless. In general with IEnumerable : stick with Any() Ref: [http://stackoverflow.com/questions/305092/which-method-performs-better-any-vs-count-0/305156#305156]

Next post, we'll look at some different mechanisms for filtering and transforming our queries.

~Eoin Campbell

The System.Linq namespace contains a fantastic set of utility extension methods for filtering, ordering & manipulating the contents of your collections and objects. In the following posts I'll go through some of the most useful ones (in my humble opinion) and how you might use them in your C# solutions
[su_note note_color="#dee3ab" text_color="#5E826F"]This is part 1 in a series of posts on Linq & Lambda capabilities in C# [/su_note]

Before we start, here's a handy static method to print your resulting collections to the console so you can quickly verify the results.

public class SuperConsole
{
    public static void WriteLine<T>(IEnumerable<T> list, bool includeCarriageReturnBetweenItems =false)
    {
        var seperator = includeCarriageReturnBetweenItems ? ",\n" : ", ";
        var result = string.Join(seperator, list);
        Console.WriteLine(result);
    }
}

Enumerable

The System.Linq.Enumerable type has 2 very useful static methods on it for quickly generating a sequence of items. Enumerable.Range & Enumerable.Repeat. The Range method allows you to quickly generate a sequential list of integers from a given starting point for a given number of items.

IEnumerable<int> range = Enumerable.Range(1, 10);
SuperConsole.WriteLine(range);
//prints "1, 2, 3, 4, 5, 6, 7, 8, 9, 10"

So why is this useful, well you could use it to quickly generate a pre-initialised list of integers rather than new'ing up a list and then iterating over it to populate it. Or you could use it to replicate for(;;) behavior. e.g.

for (int i = 1; i <= 10; i++) 
{     
    //DoWork(i); 
} 

Enumerable.Range(1, 10).ToList().ForEach(i =>
{
    //DoWork(i)
});

Repeat is similar but is not limited to integers. You can generate a Sequence of a given length with the same default value in every item. Imagine you wanted to create a list of 10 strings all initialised with a default string of "ABC";

var myList = Enumerable.Repeat("ABC", 10).ToList();

Item Conversion

There are also a few handy ways to convert/cast items built into the System.Linq namespace. The Cast<T> extension method allows you to cast a list of variables from one type to another as long as a valid cast is available. This can be useful for quickly changing a collection of super types into their base types.

var integers = Enumerable.Range(1, 5);
var objects = integers.Cast<object>().ToList();

Console.WriteLine(objects.GetType());
SuperConsole.WriteLine(objects);

//prints
//System.Collections.Generic.List`1[System.Object]
//1, 2, 3, 4, 5

But what if a valid implicit cast isn't available. What if we wanted to convert our collection of integers into a collection of strings with a ':' suffix. Thankfully Linq has us covered with it's ConvertAll Method on List

var integers = Enumerable.Range(1, 5);
var converter = new Converter<int, string>(input => string.Format("{0}: ", input));
var results = integers.ToList().ConvertAll(converter);

SuperConsole.WriteLine(results, true);
/*prints
    1:
    2:
    3:
    4:
    5:
    */

In the next post, we'll look at some the lazy & deferred execution capabilities of LINQ and some useful methods for performing quick calculations and manipulations on our collections.

~Eoin Campbell

Enterprise LibraryThe Microsoft Patterns & Practices Enterprise Library contains a number of useful applications blocks for simplifying things like DataAcces, Logging & Exception Handling in your .NET Applications. Recently we had a requirement to add HTML based formatting to the Email TraceListener in the Logging Application Block, something that's unfortunately missing from the base functionality. Thankfully, Enterprise Library is an open source code plex project so implementing a custom solution is a relatively trivial task. The email tracelistener functionality is contained in 3 main files: EmailTraceListener - The actual listener which you add to your configuration EmailTraceListenerData - The object representing the configuration settings EmailMessage - The wrapper object around a LogMessage which gets sent via email. Unfortunately because of the the way these classes are implemented in the EnterpriseLibrary Logging Block, they are not easily extended due to dependencies on Private Variables and Internal classes in the EnterpriseLibaray Logging Assembly so they need to be fully implemented in your own solution.

Implementing a Solution

Step 1 was to take a copy of these three files and place them in my own Library Solution. I prefixed the name of each of them with Html; HtmlEmailTraceListener, HtmlEmailTraceListenerData and HtmlEmailMessage. Other code needed to be cleaned up including removing some dependencies on the internal ResourceDependency attributes used to decorate properties within the class & tidying up the Xml Documentation Comments. The main change was then to enable the IsBodyHtml flag on the mail message itself. This was done in the CreateMailMessage method of the HtmlEmailMessage

protected MailMessage CreateMailMessage()
{
	string header = GenerateSubjectPrefix(configurationData.SubjectLineStarter);
	string footer = GenerateSubjectSuffix(configurationData.SubjectLineEnder);

	string sendToSmtpSubject = header + logEntry.Severity.ToString() + footer;

	MailMessage message = new MailMessage();
	string[] toAddresses = configurationData.ToAddress.Split(';');
	foreach (string toAddress in toAddresses)
	{
		message.To.Add(new MailAddress(toAddress));
	}

	message.From = new MailAddress(configurationData.FromAddress);

	message.Body = (formatter != null) ? formatter.Format(logEntry) : logEntry.Message;
	message.Subject = sendToSmtpSubject;
	message.BodyEncoding = Encoding.UTF8;
	message.IsBodyHtml = true;

	return message;
}

Using your new solution

Once implemented it's simply a matter of reconfiguring your app/web.config logging sections to use the new types you've created instead of the original Enterprise Library types. You need to change the type and listenerDataType properties of your Email Listener in the &gl;listeners@gt; section of your config.

<listeners>
      <!-- Please update the following Settings: toAddress, subjectLineStarter, subjectLineEnder-->
      <add name="EmailLog"
           toAddress="toAddress@example.com"
           subjectLineStarter="Test Console - "
           subjectLineEnder=" Alert"
           filter="Verbose"
           fromAddress="fromAddress@example.com"
           formatter="EmailFormatter"
           smtpServer="smtp.gmail.com"
           smtpPort="587"
           authenticationMode="UserNameAndPassword"
           useSSL="true"
           userName="fromAddress@example.com"
           password="Password"
           type="YourLibrary.YourNamespace.HtmlEmailTraceListener, YourLibrary, Version=1.0.0.0, Culture=neutral, PublicKeyToken=0000000000000000"
           listenerDataType="YourLibrary.YourNamespace.HtmlEmailTraceListenerData,  YourLibrary, Version=1.0.0.0, Culture=neutral, PublicKeyToken=0000000000000000"
           traceOutputOptions="Callstack" />
    </listeners>

You'll also need to ensure that you've escaped your Html formatted textFormatter template in the formatters section of your code. i.e. replacing <html> with &lt;html&gt;

<formatters>
      <add name="EmailFormatter" 
              type="Microsoft.Practices.EnterpriseLibrary.Logging.Formatters.TextFormatter, Microsoft.Practices.EnterpriseLibrary.Logging, Version=5.0.414.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35" 
              template="
                  &lt;html&gt;
                  &lt;body&gt;
                  &lt;table border=&quot;1&quot; style=&quot;border: solid 1px #000000; border-collapse:collapse;&quot;&gt;
                  &lt;tr&gt;&lt;td&gt;&lt;b&gt;Message&lt;/b&gt;&lt;/td&gt;&lt;td&gt;&lt;b&gt;{message}&lt;/b&gt;&lt;/td&gt;&lt;/tr&gt;
                  &lt;tr&gt;&lt;td&gt;Local TimeStamp&lt;/td&gt;&lt;td&gt;{timestamp(local)}&lt;/td&gt;&lt;/tr&gt;
                  &lt;tr&gt;&lt;td&gt;Timestamp&lt;/td&gt;&lt;td&gt;{timestamp}&lt;/td&gt;&lt;/tr&gt;
                  &lt;tr&gt;&lt;td&gt;Title&lt;/td&gt;&lt;td&gt;{title}&lt;/td&gt;&lt;/tr&gt;
                  &lt;tr&gt;&lt;td&gt;Severity&lt;/td&gt;&lt;td&gt;{severity}&lt;/td&gt;&lt;/tr&gt;
                  &lt;tr&gt;&lt;td&gt;Category&lt;/td&gt;&lt;td&gt;{category}&lt;/td&gt;&lt;/tr&gt;
                  &lt;tr&gt;&lt;td&gt;Priority&lt;/td&gt;&lt;td&gt;{priority}&lt;/td&gt;&lt;/tr&gt;
                  &lt;tr&gt;&lt;td&gt;EventId&lt;/td&gt;&lt;td&gt;{eventid}&lt;/td&gt;&lt;/tr&gt;
                  &lt;tr&gt;&lt;td&gt;Local Machine&lt;/td&gt;&lt;td&gt;{localMachine}&lt;/td&gt;&lt;/tr&gt;
                  &lt;tr&gt;&lt;td&gt;AppDomain&lt;/td&gt;&lt;td&gt;{appDomain}&lt;/td&gt;&lt;/tr&gt;
                  &lt;tr&gt;&lt;td&gt;LocalDomain&lt;/td&gt;&lt;td&gt;{localAppDomain}&lt;/td&gt;&lt;/tr&gt;
                  &lt;tr&gt;&lt;td&gt;Local Process Name&lt;/td&gt;&lt;td&gt;{localProcessName}&lt;/td&gt;&lt;/tr&gt;
                  &lt;tr&gt;&lt;td&gt;Local Process&lt;/td&gt;&lt;td&gt;{localProcessId}&lt;/td&gt;&lt;/tr&gt;
                  &lt;tr&gt;&lt;td&gt;Win32ThreadId&lt;/td&gt;&lt;td&gt;{win32ThreadId}&lt;/td&gt;&lt;/tr&gt;
                  &lt;tr&gt;&lt;td&gt;ThreadName&lt;/td&gt;&lt;td&gt;{threadName}&lt;/td&gt;&lt;/tr&gt;
                  &lt;tr&gt;&lt;td&gt;Extended Properties&lt;/td&gt;&lt;td&gt;
                  &lt;table border=&quot;1&quot; style=&quot;border: solid 1px #000000; border-collapse:collapse;&quot;&gt;
                  {dictionary(&lt;tr&gt;&lt;td&gt;{key}&lt;/td&gt;&lt;td&gt;{value}&lt;/td&gt;&lt;/tr&gt;)}
                  &lt;/table&gt;&lt;/td&gt;&lt;/tr&gt;
                  &lt;/table&gt;
                  &lt;/body&gt;
                  &lt;/html&gt;" />
    </formatters>

All done. Now you can happily send email log messages in HTML format via your Application Logging calls.

~Eoin Campbell