LINQ to SQL: Querying Data
- 11/23/2010
- Entities in LINQ to SQL
- Data Modeling
- Data Querying
- Thinking in LINQ to SQL
- Summary
Data Modeling
The set of entity classes that LINQ to SQL requires is a thin abstraction layer over the relational model. Each entity class defines an accessible table of data, which can be queried and modified. Modified entity instances can apply their changes to the data contained in the relational database. In this section, you will learn how to build a data model for LINQ to SQL.
DataContext
The DataContext class handles the communication between LINQ and external relational data sources. Each instance has a single Connection property that refers to a relational database. Its type is IDbConnection; therefore, it should not be specific to a particular database product. However, the LINQ to SQL implementation supports only SQL Server databases. Choosing between specific versions of SQL Server depends only on the connection string passed to the DataContext constructor.
DataContext uses metadata to map the physical structure of the relational data so that LINQ to SQL can generate the appropriate SQL code. You also use DataContext to call a stored procedure and persist data changes in entity class instances in the relational database.
Classes that specialize access for a particular database can be derived from DataContext. Such classes offer an easier way to access relational data, including members that represent available tables. You can define fields that reference existing tables in the database simply by declaring them, without a specific initialization, as in the following code:
public class SampleDb : DataContext { public SampleDb(IDbConnection connection) : base( connection ) {} public SampleDb(string fileOrServerOrConnection) : base( fileOrServerOrConnection ) {} public SampleDb(IDbConnection connection, MappingSource mapping) : base( connection, mapping ) {} public Table<Customer> Customers; }
Entity Classes
An entity class has two roles. The first role is to provide metadata to the LINQ query engine; for this, the class itself suffices—it does not require instantiation of an entity instance. The second role is to provide storage for data read from the relational data source, as well as to track possible updates and support their submission back to the relational data source.
An entity class is any reference type definition decorated with the Table attribute. You cannot use a struct (which is a value type) for this. The Table attribute can have a Name parameter that defines the name of the corresponding table in the database. If Name is omitted, the name of the class is used as the default:
[Table(Name="Products")] public class Product { ... }
An entity class can have any number and type of members. Just remember that only those data members or properties decorated with the Column attribute are significant in defining the mapping between the entity class and the corresponding table in the database:
[Column] public int ProductID;
An entity class should have a unique key. This key is necessary to support unique identity (more on this later), to identify corresponding rows in database tables, and to generate SQL statements that update data. If you do not have a primary key, entity class instances can be created but are not modifiable. The Boolean IsPrimaryKey property of the Column attribute, when set to true, states that the column belongs to the primary key of the table. If the primary key used is a composite key, all the columns that form the primary key will have IsPrimaryKey=true in their parameters:
[Column(IsPrimaryKey=true)] public int ProductID;
By default, a column mapping uses the same name as the member to which the Column attribute is applied. You can specify a different name in the Name parameter. For example, the following Price member corresponds to the UnitPrice field in the database table:
[Column(Name="UnitPrice")] public decimal Price;
If you want to filter data access through member property accessors, you have to specify the underlying storage member using the Storage parameter. If you specify a Storage parameter, LINQ to SQL bypasses the public property accessor and interacts directly with the underlying value. Understanding this is particularly important if you want to track only the modifications made by your code and not the read/write operations made by the LINQ framework. In the following code, the ProductName property is accessed for each read/write operation made by your code:
[Column(Storage="_ProductName")] public string ProductName { get { return this._ProductName; } set { this.OnPropertyChanging("ProductName"); this._ProductName = value; this.OnPropertyChanged("ProductName"); } }
In contrast, LINQ to SQL performs a direct read/write operation on the _ProductName data member when it executes a LINQ operation.
The correspondence between relational type and .NET Framework type assumes a default relational type that corresponds to the .NET Framework type used. Whenever you need to define a different type, you can use the DBType parameter, specifying a valid type by using valid SQL syntax for your relational data source. You need to use this parameter only when you want to create a database schema starting from entity class definitions (a process described in Chapter 6). Here’s an example of the DBType parameter in use:
[Column(DBType="NVARCHAR(20)")] public string QuantityPerUnit;
When the database automatically generates a column value (such as with the IDENTITY keyword in SQL Server), you might want to synchronize the entity class member with the generated value whenever you insert an entity instance into the database. To do that, you need to set the IsDBGenerated parameter for that member to true, and you also need to adapt the DBType accordingly—for example, by adding the IDENTITY modifier for SQL Server tables:
[Column(DBType="INT NOT NULL IDENTITY", IsPrimaryKey=true, IsDBGenerated=true)] public int ProductID;
It is worth mentioning that a specific CanBeNull parameter exists. This parameter is used to specify that the value can contain the null value; however, it is important to note that the NOT NULL clause in DBType is still necessary if you want to create such a condition in a database created by LINQ to SQL:
[Column(DBType="INT NOT NULL IDENTITY", CanBeNull=false, IsPrimaryKey=true, IsDBGenerated=true)] public int ProductID;
Other parameters that are relevant in updating data are AutoSync, Expression, IsVersion, and UpdateCheck.
Entity Inheritance
Sometimes a single table contains many types of entities. For example, imagine a list of contacts—some might be customers, others might be suppliers, and still others might be company employees. From a data point of view, each entity can have specific fields. (For example, a customer can have a discount field, which is not relevant for employees and suppliers.) From a business logic point of view, each entity can implement different business rules. The best way to model this kind of data in an object-oriented environment is by using inheritance to create a hierarchy of specialized classes. LINQ to SQL allows a set of classes derived from the same base class to map to the same relational table.
The InheritanceMapping attribute decorates the base class of a hierarchy, indicating the corresponding derived classes that are based on the value of a special discriminator column. The Code parameter defines a possible value, and the Type parameter defines the corresponding derived type. The discriminator column is defined by setting the IsDiscriminator argument to true in the Column attribute specification.
Listing 5-3 provides an example of a hierarchy based on the Contacts table of the Northwind sample database.
Listing 5-3. Hierarchy of classes based on contacts
[Table(Name="Contacts")] [InheritanceMapping(Code = "Customer", Type = typeof(CustomerContact))] [InheritanceMapping(Code = "Supplier", Type = typeof(SupplierContact))] [InheritanceMapping(Code = "Shipper", Type = typeof(ShipperContact))] [InheritanceMapping(Code = "Employee", Type = typeof(Contact), IsDefault = true)] public class Contact { [Column(IsPrimaryKey=true)] public int ContactID; [Column(Name="ContactName")] public string Name; [Column] public string Phone; [Column(IsDiscriminator = true)] public string ContactType; } public class CompanyContact : Contact { [Column(Name="CompanyName")] public string Company; } public class CustomerContact : CompanyContact { } public class SupplierContact : CompanyContact { } public class ShipperContact : CompanyContact { public string Shipper { get { return Company; } set { Company = value; } } }
Contact is the base class of the hierarchy. If the contact is a Customer, Supplier, or Shipper, the corresponding classes derive from an intermediate CompanyContact type, which defines the Company field corresponding to the CompanyName column in the source table. The CompanyContact intermediate class is necessary because you cannot reference the same column (CompanyName) in more than one field, even if this happens in different classes in the same hierarchy. The ShipperContact class defines a Shipper property that exposes the same value of Company but with a different semantic meaning.
The level of abstraction offered by having different entity classes in the same hierarchy is well described by the sample queries shown in Listing 5-4. The queryTyped query uses the OfType operator, whereas queryFiltered query relies on a standard where condition to filter out contacts that are not customers.
Listing 5-4. Queries using a hierarchy of entity classes
var queryTyped = from c in contacts.OfType<CustomerContact>() select c; var queryFiltered = from c in contacts where c is CustomerContact select c; foreach( var row in queryTyped ) { Console.WriteLine( row.Company ); } // We need an explicit cast to access the CustumerContact members foreach( CustomerContact row in queryFiltered ) { Console.WriteLine( row.Company ); }
The SQL queries produced by these LINQ queries are functionally identical to the following (although the actual query is different because of generalization coding):
SELECT [t0].[ContactType], [t0].[CompanyName] AS [Company], [t0].[ContactID], [t0].[ContactName] AS [Name], [t0].[Phone] FROM [Contacts] AS [t0] WHERE [t0].[ContactType] = 'Customer'
The difference between queryTyped and queryFiltered queries lies in the returned type. A queryTyped query returns a sequence of CustomerContact instances, whereas queryFiltered returns a sequence of the base class Contact. With queryFiltered, you need to explicitly cast the result into a CustomerContact type if you want to access the Company property.
Unique Object Identity
An instance of an entity class stores an in-memory representation of table row data. If you instantiate two different entities containing the same row from the same DataContext, both will reference the same in-memory object. In other words, object identity (same references) maintains data identity (same table row) using the entity unique key. The LINQ to SQL engine ensures that the same object reference is used when an entity instantiated from a query result coming from the same DataContext is already in memory. This check does not happen if you create an instance of an entity by yourself or in a different DataContext (regardless of the real data source). In Listing 5-5, you can see that c1 and c2 reference the same Contact instance, even if they originate from two different queries, whereas c3 is a different object, even if its content is equivalent to the others.
Listing 5-5. Object identity
var queryTyped = from c in contacts.OfType<CustomerContact>() orderby c.ContactID select c; var queryFiltered = from c in contacts where c is CustomerContact orderby c.ContactID select c; Contact c1 = null; Contact c2 = null; foreach( var row in queryTyped.Take(1) ) { c1 = row; } foreach( var row in queryFiltered.Take(1) ) { c2 = row; } Contact c3 = new Contact(); c3.ContactID = c1.ContactID; c3.ContactType = c1.ContactType; c3.Name = c1.Name; c3.Phone = c1.Phone; Debug.Assert( c1 == c2 ); // same instance Debug.Assert( c1 != c3 ); // different objects
Entity Constraints
Entity classes support the maintenance of valid relationships between entities, just like the support offered by foreign keys in a standard relational environment. However, the entity classes cannot represent all possible check constraints of a relational table. No attributes are available to specify the same alternate keys (unique constraint), triggers, and check expressions that can be defined in a relational database. This fact is relevant when you start to manipulate data using entity classes because you cannot guarantee that an updated value will be accepted by the underlying database. (For example, it could have a duplicate unique key.) However, because you can load into entity instances only parts (rows) of the whole table, these kinds of checks are not possible without accessing the relational database anyway.
Associations Between Entities
Relationships between entities in a relational database are modeled on the concept of foreign keys in one table referring to primary keys of another table. Class entities can use the same concept through the Association attribute, which can describe both sides of a one-to-many relationship described by a foreign key.
EntityRef
Let’s start with the concept of lookup, which is the typical operation used to get the customer related to one order. Lookup can be seen as the direct translation into the entity model of the foreign key relationship existing between the CustomerID column of the Orders table and the primary key of the Customers table. In the example entity model, the Order entity class will have a Customer property (of type Customer) that shows the customer data. This property is decorated with the Association attribute and stores its information in an EntityRef<Customer> member (named _Customer), which enables deferred loading of references (as you will see shortly). Listing 5-6 shows the definition of this association.
Listing 5-6. Association EntityRef
[Table(Name="Orders")] public class Order { [Column(IsPrimaryKey=true)] public int OrderID; [Column] private string CustomerID; [Column] public DateTime? OrderDate; [Association(Storage="_Customer", ThisKey="CustomerID", IsForeignKey=true)] public Customer Customer { get { return this._Customer.Entity; } set { this._Customer.Entity = value; } } private EntityRef<Customer> _Customer; }
As you can see, the CustomerID column must be defined in Order; otherwise, it would not be possible to obtain the related Customer. The IsForeignKey argument specifies that Order is the child side of a parent-child relationship. The ThisKey argument of the Association attribute indicates the “foreign key” column (which would be a comma-separated list if more columns were involved for a composite key) that defines the relationship between entities. If you want to hide this detail in the entity properties, you can declare that column as private, just as in the Order class shown earlier.
Using the Order class in a LINQ query, you can specify a Customer property in a filter without writing a join between Customer and Order entities. In the following query, the Country member of the related Customer is used to filter orders that come from customers of a particular Country:
Table<Order> Orders = db.GetTable<Order>(); var query = from o in Orders where o.Customer.Country == "USA" select o.OrderID;
The previous query is translated into a SQL JOIN like the following one:
SELECT [t0].[OrderID] FROM [Orders] AS [t0] LEFT JOIN [Customers] AS [t1] ON [t1].[CustomerID] = [t0].[CustomerID] WHERE [t1].[Country] = "USA"
Until now, we have used entity relationships only for their metadata in building LINQ queries. When an instance of an entity class is created, a reference to another entity (such as the previous Customer property) works with a technique called deferred loading. The related Customer entity is not instantiated and loaded into memory from the database until it is accessed either in read or write mode.
In other words, to generate a SQL query to populate the Customer-related entity when the Country property is accessed, you use the following code:
var query = from o in Orders where o.OrderID == 10528 select o; foreach( var row in query ) { Console.WriteLine( row.Customer.Country ); }
The process of accessing the Customer property involves determining whether the related Customer entity is already in memory for the current DataContext. If it is, that entity is accessed; otherwise, the following SQL query is executed and the corresponding Customer entity is loaded in memory and then accessed:
SELECT [t0].[Country], [t0].[CustomerID], [t0].[CompanyName] FROM [Customers] AS [t0] WHERE [t0].[CustomerID] = "GREAL"
The GREAL string is the CustomerID value for order 10528. As you can see, the SELECT statement queries all columns declared in the Customer entity, even if they are not used in the expression that accessed the Customer entity. (In this case, the executed code never referenced the CompanyName member.)
EntitySet
The other side of an association is a table that is referenced from another table through its primary key. Although this is an implicit consequence of the foreign key constraint in a relational model, you need to explicitly define this association in the entity model. If the Customers table is referenced from the Orders table, you can define an Orders property in the Customer class that represents the set of Order entities related to a given Customer. The relationship is implemented by an instance of EntitySet<Order>, which is a wrapper class over the sequence of related orders. You might want to directly expose this EntitySet<T> type, as in the code shown in Listing 5-7. In that code, the OtherKey argument of the Association attribute specifies the name of the member on the related type (Order) that defines the association between Customer and the set of Order entities.
Listing 5-7. Association EntitySet (visible)
[Table(Name="Customers")] public class Customer { [Column(IsPrimaryKey=true)] public string CustomerID; [Column] public string CompanyName; [Column] public string Country; [Association(OtherKey="CustomerID")] public EntitySet<Order> Orders; }
You might also decide to expose Orders as a property, as in the declaration shown in Listing 5-8. In this case, the Storage argument of the Association attribute specifies the EntitySet<T> for physical storage. You could make only an ICollection<Order> visible outside the Customer class, instead of an EntitySet<Order>, but this is not a common practice.
Listing 5-8. Association EntitySet (hidden)
public class Customer { [Column(IsPrimaryKey=true)] public string CustomerID; [Column] public string CompanyName; [Column] public string Country; private EntitySet<Order> _Orders; [Association(OtherKey="CustomerID", Storage="_Orders")] public EntitySet<Order> Orders { get { return this._Orders; } set { this._Orders.Assign(value); } } public Customer() { this._Orders = new EntitySet<Order>(); } }
With both models of association declaration, you can use the Customer class in a LINQ query, accessing the related Order entities without the need to write a join. You simply specify the Orders property. The next query returns the names of customers who placed more than 20 orders:
Table<Customer> Customers = db.GetTable<Customer>(); var query = from c in Customers where c.Orders.Count > 20 select c.CompanyName;
The previous LINQ query is translated into a SQL query like the following one:
SELECT [t0].[CompanyName] FROM [Customers] AS [t0] WHERE ( SELECT COUNT(*) FROM [Orders] AS [t1] WHERE [t1].[CustomerID] = [t0].[CustomerID] ) > 20
This example creates no Order entity instances. The Orders property serves only as a metadata source to generate the desired SQL query. If you return a Customer entity from a LINQ query, you can access the Orders of a customer on demand:
var query = from c in Customers where c.Orders.Count > 20 select c; foreach( var row in query ) { Console.WriteLine( row.CompanyName ); foreach( var order in row.Orders ) { Console.WriteLine( order.OrderID ); } }
The preceding code uses deferred loading. Each time you access the Orders property of a customer for the first time (as indicated by the bold in the preceding code), a query like the following one (which uses the @p0 parameter to filter CustomerID) is sent to the database:
SELECT [t0].[OrderID], [t0].[CustomerID] FROM [Orders] AS [t0] WHERE [t0].[CustomerID] = @p0
If you want to load all orders for all customers into memory using only one query to the database, you need to request immediate loading instead of deferred loading. To do that, you have two options. The first approach, which is demonstrated in Listing 5-9, is to force the inclusion of an EntitySet using a DataLoadOptions instance and the call to its LoadWith<T> method.
Listing 5-9. Use of DataLoadOptions and LoadWith<T>
DataContext db = new DataContext( ConnectionString ); Table<Customer> Customers = db.GetTable<Customer>(); DataLoadOptions loadOptions = new DataLoadOptions(); loadOptions.LoadWith<Customer>( c => c.Orders ); db.LoadOptions = loadOptions; var query = from c in Customers where c.Orders.Count > 20 select c;
The second option is to return a new entity that explicitly includes the Orders property for the Customer:
var query = from c in Customers where c.Orders.Count > 20 select new { c.CompanyName, c.Orders };
These LINQ queries send a SQL query to the database to get all customers who placed more than 20 orders, including the entire order list for each customer. That SQL query might be similar to the one shown in the following code:
SELECT [t0].[CompanyName], [t1].[OrderID], [t1].[CustomerID], ( SELECT COUNT(*) FROM [Orders] AS [t3] WHERE [t3].[CustomerID] = [t0].[CustomerID] ) AS [value] FROM [Customers] AS [t0] LEFT OUTER JOIN [Orders] AS [t1] ON [t1].[CustomerID] = [t0].[CustomerID] WHERE ( SELECT COUNT(*) FROM [Orders] AS [t2] WHERE [t2].[CustomerID] = [t0].[CustomerID] ) > 20 ORDER BY [t0].[CustomerID], [t1].[OrderID]
You can filter the subquery produced by relationship navigation. Suppose you want to see only customers who placed at least five orders in 1997, and you want to load only these orders. You can use the AssociateWith<T> method of the DataLoadOptions class to do that, as demonstrated in Listing 5-10.
Listing 5-10. Use of DataLoadOptions and AssociateWith<T>
DataLoadOptions loadOptions = new DataLoadOptions(); loadOptions.AssociateWith<Customer>( c => from o in c.Orders where o.OrderDate.Value.Year == 1997 select o); db.LoadOptions = loadOptions; var query = from c in Customers where c.Orders.Count > 5 select c;
The Microsoft Visual C# filter condition (o.OrderDate.Value.Year == 1997) is translated into the following SQL expression:
(DATEPART(Year, [t2].[OrderDate]) = 1997)
AssociateWith<T> can also control the initial ordering of the collection. To do that, you can simply add an order condition to the query passed as an argument to AssociateWith<T>. For example, if you want to get the orders for each customer starting from the newest one, add the orderby line shown in bold in the following code:
loadOptions.AssociateWith<Customer>( c => from o in c.Orders where o.OrderDate.Value.Year == 1997 orderby o.OrderDate descending select o);
Using AssociateWith<T> alone does not apply the immediate loading behavior. If you want both immediate loading and filtering through a relationship, you have to call both the LoadWith<T> and AssociateWith<T> methods. The order of these calls is not relevant. For example, you can write the following code:
DataLoadOptions loadOptions = new DataLoadOptions(); loadOptions.AssociateWith<Customer>( c => from o in c.Orders where o.OrderDate.Value.Year == 1997 select o); loadOptions.LoadWith<Customer>( c => c.Orders ); db.LoadOptions = loadOptions;
Loading all data into memory using a single query might be a better approach if you are sure you will access all data that is loaded, because you will spend less time in round-trip latency. However, this technique will consume more memory and bandwidth when the typical access to a graph of entities is random. Think about these details when you decide how to query your data model.
Graph Consistency
Relationships are bidirectional between entities—when an update is made on one side, the other side should be kept synchronized. LINQ to SQL does not automatically manage this kind of synchronization, which has to be done by the class entity implementation. Instead, LINQ to SQL offers an implementation pattern that is also used by code-generation tools such as SQLMetal, a tool that is part of the Windows Software Development Kit (SDK) (and has been part of the .NET Framework SDK since Microsoft .NET Framework 3.5), or the LINQ to SQL class generator included with Visual Studio. Chapter 7 describes both these tools. This pattern is based on the EntitySet<T> class on one side and on the complex setter accessor on the other side. Take a look at the tools-generated code if you are interested in the implementation details of this pattern.
Relational Model vs. Hierarchical Model
The entity model used by LINQ to SQL defines a set of objects that maps database tables into objects that can be used and manipulated by LINQ queries. The resulting model represents a paradigm shift that has been revealed in descriptions of associations between entities because it moves from a relational model (tables in a database) to a hierarchical or graph model (objects in memory).
A hierarchical/graph model is the natural way to manipulate objects in a program written in C# or Microsoft Visual Basic. When you try to consider how to translate an existing SQL query into a LINQ query, this is the major conceptual obstacle you encounter. In LINQ, you can write a query using joins between separate entities, just as you do in SQL. However, you can also write a query that uses the existing relationships between entities, as we did with EntitySet and EntityRef associations.