Manipulating Data Using LINQ
- 8/15/2013
After completing the chapter, you’ll be able to
- Describe the basics of LINQ to Entities functionality.
- Specify how LINQ statements are compiled.
- List and use the essential LINQ to Entities functions.
In most cases, developers with a strong C# background, but without an equally strong database background, use Language Integrated Query (LINQ) to query the databases they create and manage using the Entity Framework. LINQ to Entities offers a number of benefits to developers, but the main benefit is simplicity. It’s possible to create relatively complex queries without knowing much about the underlying database from a DBMS perspective. Developers can also use syntax that’s familiar to make the query, rather than resorting to working with SQL. In addition, the compiler performs part of the work of interacting with the database for the developer, so that the developer can focus on the dataset and not on the language used to access it. In short, the developer gains a considerable efficiency advantage using LINQ to Entities.
This chapter begins by introducing you to LINQ to Entities. You need to know something about how LINQ to Entities works, and you also need to know the syntax so that you can make queries. The chapter won’t provide an extensive reference, but you’ll have enough information to perform common tasks and a few advanced tasks. The point is that you’ll have the information required to get started using LINQ to Entities to perform useful work. The material provided will help you understand the examples better.
As with any LINQ query, LINQ to Entities queries are compiled to determine what they actually mean. The compiler takes the query you create and turns it into something that .NET understands. The next section of this chapter discusses how this process occurs and how it affects the way you use LINQ to Entities. This part of the chapter also provides a few insights into when you need to use the method-based expression syntax to obtain the output you desire.
The final part of the chapter discusses how to use LINQ to Entities with both entity and database functions, which, after all, is the entire point of working with LINQ to Entities in the first place. This section provides you with examples you can use to better understand how LINQ to Entities works. In addition, this material sets the stage for future examples in the book. When you finish this section, you’ll have the knowledge needed to move on to the more advanced examples in the book.
Introducing LINQ to Entities
One of the most important concepts to understand about LINQ to Entities is that it’s a declarative language. The focus is on defining what information you need, rather than on how to obtain the information. This means that you can spend more time working with data and less time trying to figure out the underlying code required to perform tasks such as accessing the database. It’s important to understand that declarative languages don’t actually remove any control from the developer; rather, they help the developer focus attention on what’s important.
The sections that follow provide you with a basic overview of LINQ to Entities. You learn about how the LINQ to Entities provider, EntityClient, works, discover how to create a basic query, and then move on to some reference information you need later to work with LINQ to Entities in examples. These sections will continue to be useful as a reference as you progress through the book, so keep them in mind as you move on to other topics later.
Considering the LINQ to Entities provider
When working with LINQ to Entities, you rely on a new provider named EntityClient. LINQ to Entities transforms your query into EntityClient objects and method calls. The EntityClient provider then creates an interface between the LINQ to Entities queries and the underlying Microsoft ADO.NET providers through the various layers of the Entity Framework. The EntityClient interacts directly with the conceptual model, as shown in the following graphic.
You don’t create an EntityClient directly. Instead, you indirectly work with the members of the System.Data.EntityClient namespace (see http://msdn.microsoft.com/library/system.data.entityclient.aspx for details). In order to start a session with a database, the application creates a connection to it with the EntityConnection object. It then transmits queries and other requests using an EntityCommand and reads the results using an EntityDataReader. When you work with LINQ to Entities, the compiler generates the necessary code for you—the focus for you as a developer is the query declaration, rather than the actual code used to make the calls. However, it’s important to know what happens in the background.
The standard ADO.NET providers are still used to communicate with the database. However, you don’t need to worry about writing all of the code used to perform this communication; EntityClient performs this task for you. A simple way to look at EntityClient is as a translator that takes your declarative language query and puts it into terms that ADO.NET can understand.
The LINQ to Entities provider interacts with ADO.NET directly, which means that you don’t need any other provider to use LINQ to Entities with other databases. However, ADO.NET uses database-specific providers. Microsoft Visual Studio ships with ADO.NET providers for both SQL Server and SQL Server Compact. Of course, there are other databases on the market. You can find a number of ADO.NET providers for other databases at http://msdn.microsoft.com/data/dd363565.aspx. If you don’t find a suitable provider on MSDN, try other sites, such as Devart (http://www.devart.com/linqconnect/) and SQLite (http://www.sqlite.org/).
Developing LINQ to Entities queries
There are a number of ways to formulate LINQ queries. The use of different approaches provides developers with flexibility and enables a developer to code using the style that the developer is used to. The first division in LINQ queries is the syntax. A developer has a choice between using the query expression syntax or the method-based expression syntax. Of the two, the query expression syntax is the easiest to understand, while the method-based expression syntax offers the greatest flexibility.
It’s also possible to specify precise output type or to allow the compiler to derive the output type based on the query you create (an implicit type). A precise output type means providing a specific type, such as IQueryable<Customers>. A derived output type relies on the var keyword (see http://msdn.microsoft.com/library/bb383973.aspx for a detailed description of this keyword). The compiler determines the variable type for you. The precise output type provides you with additional control over how the query is made and the results it provides. Using the var keyword is necessary at times because you may not be able to determine the precise type. In addition, the var keyword makes it more likely that the query will succeed and provide usable data, because the compiler determines the correct type for you.
The query itself requires the use of keywords or methods that reflect those keywords. When using the query expression syntax, a query will use the select, in, and from keywords as a minimum. The best way to see how this works is through an example. The following procedure relies on the ModelFirst example you created in the “Creating a model-first example” section of Chapter 3, “Choosing a workflow.” (You can find this example in the \Microsoft Press\Entity Framework Development Step by Step\Chapter 06\ModelFirst (LINQ Query) folder of the downloadable source code.)
Copy the ModelFirst example you created in Chapter 3 to a new folder and use this new copy for this example (rather than the copy you created in Chapter 3).
Add a new button to Form1. Name the button btnQuery and set its Text property to &Query.
Double-click btnQuery to create a new click event handler.
Type the following code for the btnQuery_Click() event handler:
private void btnQuery_Click(object sender, EventArgs e) { // Create the context. Rewards2ModelContainer context = new Rewards2ModelContainer(); // Obtain the customer list. var CustomerList = from cust in context.Customers select cust; // Process each customer in the list. StringBuilder Output = new StringBuilder("Customer List:"); foreach (var Customer in CustomerList) { // Create a customer entry for each customer. Output.Append("\r\n" + Customer.CustomerName + " has made purchases on: "); // Process each purchase for that particular customer. foreach (var Purchase in Customer.Purchases) Output.Append("\r\n\t" + Purchase.PurchaseDate); } // Display the result on screen. MessageBox.Show(Output.ToString()); }
The example begins by creating a context. It’s important to remember that you still need to create this connection to the Entity Framework layers in order to access the database. The LINQ query will be translated by the EntityClient into a series of commands that will interact with the context to perform the tasks you specify.
The LINQ query comes next. Notice that the example is using the var keyword rather than a specific type. The example asks for the list of customers from the context and places each customer in cust. It then selects cust and places this value in CustomerList. Hover the mouse over CustomerList in the foreach loop that follows, and you’ll see that Visual Studio really does assign it a type of IQueryable<Customers>, as shown here:
Let’s say that you decide you want to use IEnumerable instead of IQueryable (see the “Determining when to use IEnumerable in place of IQueryable” sidebar for details). In order to use IEnumerable, you’d need to rewrite the query like this:
// Obtain the customer list. IEnumerable<Customers> CustomerList = from cust in context.Customers select cust;
This is a master/detail database setup, so the example creates two foreach loops to process the data. The first foreach loop obtains one Customer from CustomerList and processes the customers one at a time. The second foreach loop obtains one Purchase from Customer.Purchases and processes each purchase for that customer one at a time. The result is an output string that is displayed in a message box.
Click Start or press F5. The application compiles and runs.
Click Query. You’ll see the result shown here (assuming that you ran the example from Chapter 3 and didn’t modify the code from that example):
Defining the LINQ to Entities essential keywords
It’s important to know the basic keywords used to create a LINQ query. Interestingly enough, there are only a few keywords to remember, but you can combine them in various ways to obtain specific results. The following list contains these basic keywords and provides a simple description of each one (future examples will expand on these definitions for you):
ascending Specifies that a sorting operation takes place from the least (or lowest) element of a range to the highest element of a range. This is normally the default setting. For example, when performing an alphabetic sort, the sort would be in the range from A to Z.
by Specifies the field or expression used to implement a grouping. The field or expression defines a key used to perform the grouping task.
descending Specifies that a sorting operation takes place from the greatest (or highest) element of a range to the lowest element of a range. For example, when performing an alphabetic sort, the sort would be in the range from Z to A.
equals Used between the left and right clauses of a join statement to join the primary contextual data source to the secondary contextual data source. The field or expression on the left of the equals keyword specifies the primary data source, while the field or expression on the right of the equals keyword specifies the secondary data source.
from(required) Specifies the data source used to obtain the required information and defines a range variable. This variable has the same purpose as a variable used for iteration in a loop.
group Organizes the output into groups using the key value you specify. Use multiple group clauses to create multiple levels of output organization. The order of the group clauses determines the depth at which a particular key value appears in the grouping order. You combine this keyword with by to create a specific context.
in(required) Used in a number of ways. In this case, the keyword determines the contextual database source used for a query. When working with a join, the in keyword is used for each contextual database source used for the join.
into Specifies an identifier that you can use as a reference for LINQ query clauses such as join, group, and select.
join Creates a single data source from two related data sources, such as in a master/detail setup. A join can specify an inner, group, or left-outer join, with the inner join as the default. You can read more about joins at http://msdn.microsoft.com/library/bb311040.aspx.
let Defines a range variable that you can use to store subexpression results in a query expression. Typically, the range variable is used to provide an additional enumerated output or to increase the efficiency of a query (so that a particular task, such as finding the lowercase value of a string, need not be done more than one time).
on Specifies the field or expression used to implement a join. The field or expression defines an element that is common to both contextual data sources.
orderby Creates a sort order for the query. You can add the ascending or descending keyword to control the order of the sort. Use multiple orderby clauses to create multiple levels of sorting. The order of the orderby clauses determines the order in which the sort expressions are handled, so using a different order will result in different output.
where Defines what LINQ should retrieve from the data source. You use one or more Boolean expressions to define the specifics of what to retrieve. The Boolean expressions are separated from each other using the && (AND) and || (OR) operators.
select (required) Determines the output from the LINQ query by specifying what information to return. This statement defines the data type of the elements that LINQ returns during the iteration process.
Defining the LINQ to Entities operators
The keywords described in the “Defining the LINQ to Entities essential keywords” section of the chapter determine what happens when a query is made using the query expression syntax. Operators determine how the query is made when using the method-based expression syntax. You use operators to modify the output in the following ways:
- Sort Modify the natural order of the data returned from the data source. For example, you could create a sorted order of customers based on their last name, even if the database keeps the customer list in a random order.
- Group Create an order that is depending on a specific field or expression. For example, you could group a list of customers by the first letter of their last name.
- Shape Modify the natural appearance of the data to obtain specific results. For example, you could filter the data so that the output only contains customers whose last name begins with a G, or you could determine the average value of the data using aggregation.
The following sections describe a number of common tasks you can perform using LINQ to Entities operators. These are basic operations. Remember that you can combine operators to create almost any data manipulation scenario. Using LINQ to Entities operators makes it possible for you to declare what you want as output, rather than determine how to obtain it. The compiler determines how a particular task is done.
Performing filtering and projection
The main task of any LINQ to Entities expression is to obtain data and provide it as output. The “Developing LINQ to Entities queries” section of this chapter demonstrates the techniques for performing this basic task. However, once you have the data, you may want to project or filter it as needed to shape the data prior to output.
Projection is the act of modifying the output to shape it in a specific way. For example, you can change the case of the characters in a string or perform a calculation on numeric output. It’s also possible to use methods to transform the data in a variety of ways that are only limited by your imagination and the requirements of your application. The methods associated with projection are Select() and SelectMany().
Filtering is the act of removing undesirable elements from the output. You may only want the names of customers who have achieved a certain number of sales or who live in a particular area. Use the Where() method to achieve the desired level of filtering.
Performing joins
Look again at the example in the “Developing LINQ to Entities queries” section of this chapter. Notice that the example is able to obtain the list of purchases associated with a particular customer because there is a navigable property that is defined as part of the model. It’s important to keep this bit of information in mind, especially when you normally work with SQL Server directly by making SQL statements. The join defined by LINQ to Entities is for related tables that have no navigable properties in the model. The result is the same as a standard join, but the purpose of the join is different. Use navigable properties whenever possible to work with related tables.
When performing a join to group like tables together, you use the Join() or GroupJoin() method. The tables must still possess a common attribute or property that you can exploit to create the relationship. For example, let’s say that your in-house database has a table containing a list of products that employ a bar code for identification. However, the description of the product resides on a web service hosted by the supplier. You can use a join on the bar code to obtain a description for the product in your in-house database from the supplier’s web service. Because you don’t support or own the supplier’s database, the database won’t appear as part of your model, and you won’t have any navigable properties to access it.
Creating a set
Shaping a result set means defining the set according to specific properties. For example, you might only want the distinct elements from the result set of a query. Even though a particular row in a table is distinct, the result set may not contain the entire row, resulting in duplicates in the output, so you need a way to shape the output so the user only sees unique entries. The methods for creating sets are All(), Any(), Concat(), Contains(), DefaultIfEmpty(), Distinct(), EqualAll(), Except(), Intersect(), and Union().
Ordering the output
Sorting a result set modifies the order in which the individual records appear so that the user can more easily detect patterns in the output, find a specific output, and look for errors, such as misspellings and duplicate entries. You can combine ordering methods to create a unique output. However, it’s an error to provide the same ordering methods more than one time on a result set, and you’ll see an exception if you try to do so. The ordering methods are OrderBy(), OrderByDescending(), ThenBy(), ThenByDescending(), and Reverse().
When ordering a result set, it’s important to realize that LINQ to Entities works against the data source, rather than using an in-memory representation, as would be done when working with the Common Language Runtime (CLR) objects. The data source may have special sort functionality implemented, such as case ordering, kanji ordering, and null ordering. The difference in sort functionality will affect the output you see.
Grouping the output
Sorting a result by grouping like items together using a common attribute (such as all customers who live in a particular city) helps users see patterns in the output. When grouping like items together, you use the GroupBy() method. It’s possible to create multiple levels of grouping by combining multiple GroupBy() method calls. Unlike sorting methods, you can create multiple levels of the same GroupBy() method calls because each GroupBy() method call creates a new level in the output.
When grouping a result set, it’s important to realize that LINQ to Entities works against the data source, rather than using an in-memory representation, as would be done when working with the CLR objects. The data source may contain null values that will affect the output in ways that you don’t see when performing the same task using CLR objects.
Performing aggregation
Shaping the result set by combining or aggregating it in certain ways can help a user see the information in a new way. For example, you might obtain the average of a numeric field so that the user knows when a particular entry is either higher or lower than average. The methods you use to aggregate data are Aggregate(), Average(), Count(), LongCount(), Max(), Min(), and Sum().
There are some significant differences in the way that aggregation occurs when using LINQ to Entities, as contrasted to using the CLR. The most important difference is that the calculations occur on the server, so any loss of precision or type conversions will occur on the server as well. When an error occurs, such as an overflow, the exception is raised as a data source or Entity Framework exception, rather than a standard CLR exception. The errors are only raised when they conflict with the data source assumptions about the data. For example, when working with null values, a CLR calculation will raise an error, but SQL Server won’t. Table 6-1 describes how SQL Server handles nulls so that you know what to expect as output.
Table 6-1 Techniques SQL Server uses to handle nulls
Method |
No data |
All nulls |
Some nulls |
No nulls |
Average |
Returns null |
Returns null |
Returns the average of the non-null values in the sequence |
Returns the average of all of the values in the sequence |
Count |
Returns 0 |
Returns the number of null values in the sequence |
Returns the combined number of null and non-null values in the sequence |
Returns the total number of values in the sequence |
Max |
Returns null |
Returns null |
Returns the maximum of the non-null values in the sequence |
Returns the maximum of all of the values in the sequence |
Min |
Returns null |
Returns null |
Returns the minimum of the non-null values in the sequence |
Returns the minimum of all of the values in the sequence |
Sum |
Returns null |
Returns null |
Returns the sum of the non-null values in the sequence |
Returns the sum of all of the values in the sequence |
Interacting with type
Shaping data by converting its type from one form to another lets you perform additional tasks, such as creating specific output views. For example, it’s common to convert data to a string type so that it’s possible to use the string methods to manipulate the appearance of the data in certain ways, such as to make the data more aesthetically pleasing to the viewer.
The only types that you can convert or test are those that map to an Entity Framework type. This functionality works at the conceptual level, rather than at the data source, as does some of the other functionality discussed so far. The two common methods for converting and testing data are Convert() (primitive types) and OfType() (entity types). When working with C#, you can also use the is() and as() methods.
Paging the output
Paging methods sort the data by interacting with the rows out of order or shape the data by removing some rows entirely. The output you receive depends on the way in which you use the paging methods in your code. The paging methods are ElementAt(), First(), FirstOrDefault(), Last(), LastOrDefault(), Single(), Skip(), Take(), and TakeWhile(). If you try to use a paging method on a sequence that doesn’t contain any entries or contains all null values, the result is null.
Summarizing the LINQ operators
LINQ (and by extension LINQ to Entities) supports a number of operators that you access as methods. The following list provides a description of each of these methods; you can use it to determine which to use to perform a specific task:
- Aggregate() Applies an accumulator function over the elements of a sequence. For example, you might choose to concatenate the individual strings of a series of records together. You can read more about this method at http://msdn.microsoft.com/library/bb548651.aspx.
- All() Determines whether all of the elements in a sequence satisfy a particular condition. You can read more about this method at http://msdn.microsoft.com/library/bb548541.aspx.
- Any() Determines whether a sequence contains any elements. You can read more about this method at http://msdn.microsoft.com/library/bb337697.aspx.
- Average() Computes the average of the elements found in a sequence. You can read more about this method at http://msdn.microsoft.com/library/bb354760.aspx.
- Concat() Adds (concatenates) one sequence to another, so that you end up with a single sequence. You can read more about this method at http://msdn.microsoft.com/library/bb302894.aspx.
- Contains() Looks for the specified element in the specified sequence using the default equality comparator. You can read more about this method at http://msdn.microsoft.com/library/bb352880.aspx.
- Convert() Changes the base type of an element into another base type. You can read more about this method at http://msdn.microsoft.com/library/system.convert.aspx.
- Count() Obtains the number of elements in a sequence. You can read more about this method at http://msdn.microsoft.com/library/bb338038.aspx. (See the LongCount() method when you want to count a large number of elements.)
- DefaultIfEmpty() Returns the sequence when there are elements to return. Otherwise, this method returns the default value for the specified sequence, which will likely be an empty or null value. You can read more about this method at http://msdn.microsoft.com/library/bb360179.aspx.
- Distinct() Returns only the unique elements from a sequence. When two elements have the same value, returns just one of the two elements. You can read more about this method at http://msdn.microsoft.com/library/bb348436.aspx.
- ElementAt() Returns the element found at the specified index. You can read more about this method at http://msdn.microsoft.com/library/bb299233.aspx.
- EqualAll() Determines whether two sequences are precisely equal, which means that they must have the same members appearing in the same order. This operator isn’t documented as a standard LINQ operator, so Microsoft may restrict its use. You can read more about this method at http://msdn.microsoft.com/vstudio/bb737910.aspx.
- Except() Creates a sequence that contains the elements that don’t match between two sequences. The comparison is made using the default comparer. You can read more about this method at http://msdn.microsoft.com/library/bb300779.aspx.
- First() Returns the first element in a sequence. You can read more about this method at http://msdn.microsoft.com/library/bb291976.aspx.
- FirstOrDefault() Returns the first element in a sequence or a default element when no elements exist. You can read more about this method at http://msdn.microsoft.com/library/bb340482.aspx.
- GroupBy() Places the elements in a sequence in groups using the specified key. You can read more about this method at http://msdn.microsoft.com/library/bb534501.aspx.
- GroupJoin() Combines and groups two separate sequences into a single sequence using a common attribute or property. The resulting groups are based upon the same type of expression used to group a single sequence using the Group() method. You can read more about this method at http://msdn.microsoft.com/library/bb534675.aspx.
- Intersect() Produces the set intersection of two sequences by using the default comparator. You can read more about this method at http://msdn.microsoft.com/library/bb460136.aspx.
- Join() Combines two separate sequences into a single sequence using a common attribute or property. You can read more about this method at http://msdn.microsoft.com/library/bb534675.aspx.
- Last() Returns the last element in a sequence. You can read more about this method at http://msdn.microsoft.com/library/bb358775.aspx.
- LastOrDefault() Returns the last element in a sequence or a default element when no elements exist. You can read more about this method at http://msdn.microsoft.com/library/bb301849.aspx.
- LongCount() Obtains the number of elements in a sequence and returns that value as a 64-bit number. You use this version of Count() when the number of elements is high and you want to avoid a potential overflow condition. You can read more about this method at http://msdn.microsoft.com/library/bb353539.aspx.
- Max() Determines which element contains the maximum value in a sequence. You can read more about this method at http://msdn.microsoft.com/library/bb335614.aspx.
- Min() Determines which element contains the minimum value in a sequence. You can read more about this method at http://msdn.microsoft.com/library/bb298087.aspx.
- OfType() Determines whether an element is of a specific type. You can read more about this method at http://msdn.microsoft.com/library/bb360913.aspx.
- OrderBy() Sorts the elements of a sequence in ascending order using the specified key. You can read more about this method at http://msdn.microsoft.com/library/bb534966.aspx.
- OrderByDescending() Sorts the elements of a sequence in descending order using the specified key. You can read more about this method at http://msdn.microsoft.com/library/bb534855.aspx.
- Reverse() Inverts the order of the elements in a sequence. The elements aren’t sorted—merely reversed in order. You can read more about this method at http://msdn.microsoft.com/library/bb358497.aspx.
- Select() Chooses each element of a sequence and optionally modifies its form. You can read more about this method at http://msdn.microsoft.com/library/bb548891.aspx.
- SelectMany() Chooses each element of a sequence, places it in an IEnumerable object, and flattens the entire sequence into a single sequence. You can read more about this method at http://msdn.microsoft.com/library/bb534336.aspx.
- Single() Returns the only element in a sequence that satisfies the specified condition and throws an exception if more than one element that satisfies the condition exists. You can read more about this method at http://msdn.microsoft.com/library/bb155325.aspx.
- Skip() Bypasses (skips) the specified number of elements in a sequence and then returns the elements that remain. You can read more about this method at http://msdn.microsoft.com/library/bb358985.aspx.
- Sum() Adds (sums) the individual values of each element in a sequence to create a total. You can read more about this method at http://msdn.microsoft.com/library/bb298138.aspx.
- Take() Returns the specified number of elements in a sequence and then skips (bypasses) the elements that remain. You can read more about this method at http://msdn.microsoft.com/library/bb503062.aspx.
- TakeWhile() Returns the specified number of elements in a sequence while the specified condition remains true, and then skips (bypasses) the elements that remain. You can read more about this method at http://msdn.microsoft.com/library/bb534804.aspx.
- ThenBy() Performs a subsequent sorting of elements in a sequence in ascending order using the specified key. You must precede this method call with either the OrderBy() or OrderByDecending() method call. You can read more about this method at http://msdn.microsoft.com/library/bb534743.aspx.
- ThenByDescending() Performs a subsequent sorting of elements in a sequence in descending order using the specified key. You must precede this method call with either the OrderBy() or OrderByDecending() method call. You can read more about this method at http://msdn.microsoft.com/library/bb534736.aspx.
- Union() Produces the set union of two sequences by using the default comparator. You can read more about this method at http://msdn.microsoft.com/library/bb341731.aspx.
- Where() Filters a sequence based on the criterion you provide in the form of an expression. You can read more about this method at http://msdn.microsoft.com/library/bb534803.aspx.