LINQ Best Practices

LINQ (Language Integrated Query) is commonly used to handle and sort data in C#. It allows developers to write easily readable queries directly in C# code. Used efficiently, LINQ is a great tool. But its queries can significantly impact performance, especially with large data sets, if used improperly. When using LINQ constantly, it can be easy to forget about some of the best practices and just do whatever seems simplest. This post serves as a reminder of just a few of the best practices to maximize LINQ’s efficiency.

Count vs Any

In C#, LINQ provides several ways to check if a collection contains elements, two commonly used methods are Count() and Any(). While both can be used to check for a specific object, their performance and use cases differ quite a bit. Count(), as you can probably guess from the name, is used to count the number of objects in a collection. It will traverse the entire collection and return the total count. Any(), on the other hand, checks whether there is at least one element, matching a specific condition, in a collection. Unlike Count(), it does not iterate through the whole collection, and stops executing as soon as it finds a match. This makes it much more efficient for existence checks. Let's say we have a list of coffee orders and want to check if any of those orders are for a large.

timer.Start();
if (allCoffees.Count(x => x.CoffeeSize == Size.Large) > 0)
{
	timer.Stop();
	Console.WriteLine($"Count: {timer.ElapsedTicks}");
}

timer.Restart();
if (allCoffees.Any(x => x.CoffeeSize == Size.Large))
{
	timer.Stop();
	Console.WriteLine($"Any: {timer.ElapsedTicks}");
}

These are both valid LINQ statements and return the same result, but when using a timer to measure the elapsed ticks you can see that .Any() is significantly faster.

This is only a minor difference for a small dataset like this example, but in a real world scenario this can be the difference between your app feeling responsive or slow.

Find vs FirstOrDefault

Similar to Count() and Any(), both Find() and FirstOrDefault() have similar but slightly different uses. Find() searches a list and returns the first element that matches a specified predicate. If no match is found, it will return null or the default value. Find() will scan the list in order and stop as soon as it finds the first match. However, it is only available on List collections, which is where FirstorDefault comes in, as it works on any collection that implements IEnumerable. It also returns the first element that matches the predicate, or if there are no matches, the default value of the type.

Say instead of just finding out if an element exists in our List of coffee orders, we want to pull a specific order, we could run either of the following queries.

timer.Restart();
if (allCoffees.FirstOrDefault(x => x.CoffeeBrew == Brew.Latte && x.CoffeeFlavor == Flavor.PumpkinSpice 
							 &&  x.CoffeeSize == Size.Medium) != null)
{
	timer.Stop();
	Console.WriteLine($"FirstOrDefault: {timer.ElapsedTicks}");
}

timer.Restart();
if (allCoffees.Find(x => x.CoffeeBrew == Brew.Latte && x.CoffeeFlavor == Flavor.PumpkinSpice 
					&&  x.CoffeeSize == Size.Medium) != null)
{
	timer.Stop();
	Console.WriteLine($"Find: {timer.ElapsedTicks}");
}

Both will return the same object, but Find() will do so faster as it is optimized for list collections.

When it's available, Find() is the better option, but when working with a data type other than a List, FirstOrDefault will get the job done.

When to not use LINQ

Knowing when not to use LINQ can be just as important as knowing when to use different types of LINQ statements. If you have code that is running constantly, or running multiple times per second, plain old for loops may be the better option. Here is a LINQ statement and an equivalent for loop. Both are iterating through numbers, while performing math operations on them, to simulate performing operations on data repeatedly.

var testList = allCoffees.Count;
		
timer.Restart();
var s = Enumerable.Range(1, testList)
					.Select(n => n * 2)
					.Select(n => n * 100)
					.Select(n => Math.Pow(n, 3))
					.Sum();
timer.Stop();
Console.WriteLine($"LINQ: {timer.ElapsedTicks}");

timer.Restart();
double sum = 0;
for (int n = 1; n <= testList; n++)
{
	int a = n * 2;
	double b = a * 100;
	double c = Math.Pow(b, 3);
	sum += c;
}
timer.Stop();
Console.WriteLine($"For loop: {timer.ElapsedTicks}");

Although both run relatively quickly for this simulation, the for loop is more efficient here.

This is caused by small amounts of overhead from operations like calling the lambda expressions repeatedly.

For more on LINQ: https://learn.microsoft.com/en-us/dotnet/csharp/linq/

BiTS Blog

Search This Blog

Promoting SoC Through Application Layering

LINQ Best Practices