Getting Entity Framework to Pick up on the Auto-generated Part of the Composite Primary Key

In my previous post, I wrote about how to get SQL Server to automatically generate a composite primary key for a table when part of that key is also a foreign key to a parent table. That all works nicely in SQL Server and using regular ADO.NET commands from a .NET application. However, as the title of this post suggests, it is a little more of an issue when it comes to Entity Framework.

When you get Entity Framework to generate its model from the database then it picks up on identity columns automatically. In the example in the previous post there were no identity columns in the child table because the value was being generated from within the INSTEAD OF INSERT trigger and the column was not marked as an identity column.

It is easy enough to go into the entity model and manually set the properties of the Id column to have the StoreGeneratedPattern set to Identity which indicates to Entity Framework that it has to find out what the value is once the entity has been inserted into the database.

For an integer column this means that the Entity Framework will issue an INSERT statement like this:

exec sp_executesql N'insert [dbo].[Products]([TenantId], [Name])
values (@0, @1)
select [Id]
from [dbo].[Products]
where @@ROWCOUNT > 0 and [TenantId] = @0 and [Id] = scope_identity()',N'@0 int,@1 nvarchar(50)',@0=1,@1=N'Test Product A'

You'll notice that this isn't just a simple insert, it also performs a SELECT immediately afterwards in order to get the value of the newly inserted key back so that it can update the entity.

However, the way it does it will not produce a value. SCOPE_IDENTITY() will always be null</CODEL p contained.< it value the destroyed have would trigger a used we fact was, there if even fact, In column. identity no was because>

After much searching around on the internet I didn’t find a solution for this issue. I even posted on StackOverflow and didn’t get an answer back (at least, I haven’t at the time of writing). However, I did eventually come across a workaround that could be used in place of SCOPE_IDENITY(). The work around involved changing the way the trigger worked to some extent.

The new trigger would capture the keys that it inserted and output them in a select statement at the end. The new trigger looked like this:

ALTER TRIGGER dbo.IOINS_Products 
   ON  dbo.Products 
   INSTEAD OF INSERT
AS 
BEGIN
  SET NOCOUNT ON;
  
  -- Acquire the lock so that no one else can generate a key at the same time.
  -- If the transaction fails then the lock will automatically be released.
  -- If the acquisition takes longer than 15 seconds an error is raised.
  DECLARE @res INT;
  EXEC @res = sp_getapplock @Resource = 'IOINS_Products', 
    @LockMode = 'Exclusive', @LockOwner = 'Transaction', @LockTimeout = '15000',
    @DbPrincipal = 'public'
  IF (@res < 0)
  BEGIN
    RAISERROR('Unable to acquire lock to update Products table.', 16, 1);
  END

  -- Work out what the current maximum Ids are for each tenant that is being
  -- inserted in this operation.
  DECLARE @baseId TABLE(BaseId int, TenantId int);
  INSERT INTO @baseId
  SELECT MAX(ISNULL(p.Id, 0)) AS BaseId, i.TenantId 
  FROM  inserted i
  LEFT OUTER JOIN Products p ON i.TenantId = p.TenantId
  GROUP BY i.TenantId

  -- The replacement insert operation
  DECLARE @keys TABLE (Id INT);
  INSERT INTO Products
  OUTPUT inserted.Id INTO @keys
  SELECT 
    i.TenantId, 
    ROW_NUMBER() OVER(PARTITION BY i.TenantId ORDER BY i.TenantId) + b.BaseId 
      AS Id,
    Name
  FROM inserted i
  INNER JOIN @baseId b ON b.TenantId = i.TenantId

  -- Release the lock.
  EXEC @res = sp_releaseapplock @Resource = 'IOINS_Products', 
    @DbPrincipal = 'public', @LockOwner = 'Transaction'

  SELECT Id FROM @keys
END
GO

As you can see the last line of the trigger performs a SELECT in order to get the keys.

Since the Entity Framework is only expecting one result set from the SQL it issued, the fact that we have now added in a second result set in the trigger tricks it into thinking that it has got the SCOPE_IDENTITY() value it was asking for. In fact, that result set, which contained a null value anyway, is now the second result set and is ignored by the Entity Framework.

If you are only ever going to use your database with Entity Framework then this solution may work for you. However, the idea that the trigger creates additional (and potentially unexpected) result sets may prove this workaround’s undoing in a more widely used system.

Composite Primary Keys Including Identity Like Column

I’ve been thinking of a way to organise some data for a multi-tenanted system. As such the database that would be used would have to mark pretty much every piece of data with the id of the tenant in some way. This got me thinking about using composite primary keys on tables.

Using a composite primary key with the TenantId as the first part has the advantage that all data relating to a single tenant is adjacent in the database which should speed up reads as there will be less to read. Any reads will likely be confined to a single tenant. I can’t see lookups being made independent of the tenant being frequent, and if they do occur then it is most likely not as part of the application but as a DBA sitting down at SSMS.

The second part of the composite primary key is kind of like an identity column, but not. While an identity column could be used it will produce gaps in the sequence for an individual tenant and I didn’t want that. I wanted similar behaviour to an identity column but separate counters for each tenant.

Pitfalls on the way

When I was looking for a solution much of what I found talked about doing the process manually, often from the application side. I really don’t want the application having to know that much about the database.

Other solutions talked about only ever using stored procedures, which is a little better, but then if you have many stored procedures that insert into the table with the composite key then you have to remember to call the code from all those places.

In the end I went for an INSTEAD OF INSERT trigger. But implementing the ideas I’d run across in a trigger proved problematic.

I tried a few different ideas which worked well when a single user was accessing the database, but when stress testing the design with multiple simultaneous inserts I got either primary key violations (insufficient locks) or deadlocks (too much locking). In the end I discovered that there is a mutex like concept in SQL Server that locks sections of code rather than tables. While this may not sound terribly useful if there is only ever one piece of code that generates the identity like column, such as the INSTEAD OF INSERT trigger then it could work.

The Solution

Here is the solution, which I’ll walk you through in a moment. For the moment, just be aware that there are two tables: Tenants and Products. A product can be owned by a single tenant only, so there is a parent/child relationship.

Tables:

CREATE TABLE [dbo].[Tenants](
	[Id] [int] IDENTITY(1,1) NOT NULL,
	[Name] [nvarchar](50) NOT NULL,
 CONSTRAINT [PK_Tenants] PRIMARY KEY CLUSTERED 
(
	[Id] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
) ON [PRIMARY]
GO

CREATE TABLE [dbo].[Products](
	[TenantId] [int] NOT NULL,
	[Id] [int] NOT NULL,
	[Name] [nvarchar](50) NOT NULL,
 CONSTRAINT [PK_Products] PRIMARY KEY CLUSTERED 
(
	[TenantId] ASC,
	[Id] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
) ON [PRIMARY]

GO

ALTER TABLE [dbo].[Products]  WITH CHECK ADD  CONSTRAINT [FK_Products_Tenants] FOREIGN KEY([TenantId])
REFERENCES [dbo].[Tenants] ([Id])
GO

ALTER TABLE [dbo].[Products] CHECK CONSTRAINT [FK_Products_Tenants]
GO

Trigger:

CREATE TRIGGER dbo.IOINS_Products 
   ON  dbo.Products 
   INSTEAD OF INSERT
AS 
BEGIN
  SET NOCOUNT ON;

  -- Acquire the lock so that no one else can generate a key at the same time.
  -- If the transaction fails then the lock will automatically be released.
  -- If the acquisition takes longer than 15 seconds an error is raised.
  DECLARE @res INT;
  EXEC @res = sp_getapplock @Resource = 'IOINS_Products', 
    @LockMode = 'Exclusive', @LockOwner = 'Transaction', @LockTimeout = '15000',
    @DbPrincipal = 'public'
  IF (@res < 0)
  BEGIN
    RAISERROR('Unable to acquire lock to update Products table.', 16, 1);
  END

  -- Work out what the current maximum Ids are for each tenant that is being
  -- inserted in this operation.
  DECLARE @baseId TABLE(BaseId int, TenantId int);
  INSERT INTO @baseId
  SELECT MAX(ISNULL(p.Id, 0)) AS BaseId, i.TenantId 
  FROM  inserted i
  LEFT OUTER JOIN Products p ON i.TenantId = p.TenantId
  GROUP BY i.TenantId

  -- The replacement insert operation
  INSERT INTO Products
  SELECT 
    i.TenantId, 
    ROW_NUMBER() OVER(PARTITION BY i.TenantId ORDER BY i.TenantId) + b.BaseId 
      AS Id,
    Name
  FROM inserted i
  INNER JOIN @baseId b ON b.TenantId = i.TenantId

  -- Release the lock.
  EXEC @res = sp_releaseapplock @Resource = 'IOINS_Products', 
    @DbPrincipal = 'public', @LockOwner = 'Transaction'
END
GO

The Tenants table does use a traditional IDENTITY column for its primary key, but the Products table does not.

The INSTEAD OF INSERT trigger will do just what it says, instead of using the insert statement that was issues, it will replace it with its own code. This means that the database has the opportunity to perform additional actions before the insert takes place, potentially replacing the INSERT itself.

The first thing the trigger does is attempt to acquire a lock to ensure that no one else can calculate the next available id value. Rather than locking a table it uses sp_getapplock (available from SQL Server 2005 onwards) which locks code instead since the trigger is the only place that will calculate the next Id value.

A table variable is created to hold all the current max values of Id for each TenantId in the current insert operation. These will act as the base Ids for the subsequent INSERT operation. This ensures that if the INSERT is processing multiple TenantIds in one operation they are all accounted for. In reality, for the tenant based application I’m thinking about, I don’t see this happening all that frequently if at all, but fortune favours the prepared, so they say.

Next is the actual INSERT statement that will put the data into the database. I’m using the ROW_NUMBER() function (available from SQL Server 2005) to work out a sequential value for each row that is being inserted partitioned by the TenantId. This means that each TenantId will have a row number starting a 1 which can then be added to the base Id that was worked out in the previous step. The Id column is the only substitution in the INSERT, the remainder of the columns are passed through as is without alteration.

Finally, the lock is released using sp_releaseapplock(). This ensures that any other process that is waiting to INSERT data into this table can do so. In the event that the transaction fails the lock will be released automatically as the Owner was set to the transaction.

Issues

While composite keys can mean that you can create more natural keys for your data over using a surrogate key like an single unique identity column or GUID/UUID as it follows more closely the model of the business, it does mean that if the business changes then you may end up needing to find ways to change a primary key, something that should remain immutable. For example, if you have two tenants in your system and then they merge into a single tenant it is likely you are going to have duplicate keys if you simply change the tenant Id on one.

If the shape of the table changes then the INSTEAD OF INSERT trigger will have to be updated as well as the replacement insert statement will have to take account of the new columns or eliminate references to columns that no longer exist.

There are also issues with Entity Framework not being able to get the newly generated value back. I’ll discuss that in more detail in a future blog post.

Creating a data dictionary with SQL Server and MediaWiki – Part three: inbound references

In my previous posts (parts one and two), I showed how to generate mediawiki markup to generate lists of tables and columns that can be used as the starting blocks of a data dictionary. In this post I continue that by showing how to generate a list of inbound links (i.e. a list of other tables that reference this one)

The script

At the top of the script are three parameters that you need to set to suit your needs. The @catalogue is the name of the database, the @schema and @table are the names of the destination schema and table.

-- Edit these parameters to suit your needs.
DECLARE @catalogue SYSNAME = 'AdventureWorks';
DECLARE @schema SYSNAME = 'Production'
DECLARE @table SYSNAME = 'Product'

DECLARE @fk_catalogue SYSNAME;
DECLARE @fk_schema SYSNAME;
DECLARE @fk_table_name SYSNAME;
DECLARE @fk_column_name SYSNAME;

SET NOCOUNT ON

PRINT '{| class="wikitable sortable"
|-
! scope="col" | Schema
! scope="col" | Table
! scope="col" | Column
! scope="col" class="unsortable" | Notes
'

DECLARE fk_cursor CURSOR FOR
SELECT FK.TABLE_CATALOG, FK.TABLE_SCHEMA, FK.TABLE_NAME, CU.COLUMN_NAME
FROM INFORMATION_SCHEMA.REFERENTIAL_CONSTRAINTS C
INNER JOIN INFORMATION_SCHEMA.TABLE_CONSTRAINTS FK
	ON C.CONSTRAINT_NAME = FK.CONSTRAINT_NAME
INNER JOIN INFORMATION_SCHEMA.TABLE_CONSTRAINTS PK
    ON C.UNIQUE_CONSTRAINT_NAME = PK.CONSTRAINT_NAME
INNER JOIN INFORMATION_SCHEMA.KEY_COLUMN_USAGE CU
    ON C.CONSTRAINT_NAME = CU.CONSTRAINT_NAME
WHERE PK.TABLE_SCHEMA = @schema
AND PK.TABLE_NAME = @table
AND PK.TABLE_CATALOG = @catalogue
ORDER BY FK.TABLE_SCHEMA, FK.TABLE_NAME, CU.COLUMN_NAME

OPEN fk_cursor

FETCH NEXT FROM fk_cursor
INTO @fk_catalogue, @fk_schema, @fk_table_name, @fk_column_name

WHILE @@FETCH_STATUS = 0
BEGIN

PRINT '|-
| '+ @fk_schema+'
| [[' + @fk_catalogue + '.'+@fk_schema+'.'+@fk_table_name + '|'+ @fk_table_name+']]
| '+ @fk_column_name + '
|
'

FETCH NEXT FROM fk_cursor
INTO @fk_catalogue, @fk_schema, @fk_table_name, @fk_column_name
END
PRINT '|}'

CLOSE fk_cursor;
DEALLOCATE fk_cursor;
SET NOCOUNT OFF

How it renders

This is how the resulting mediawiki code renders in the browser.

Rendering of inbound references

Creating a data dictionary with SQL Server and MediaWiki – Part two: a list of columns

In my previous post, I showed how to create a SQL Script that generates Mediawiki mark-up to create a list of tables (or views) in your database. In this post, I’ll continue with generating a list of columns.

Script walk through

At the top of the script are three variables used to define which table (or view) from which we want the columns. The @catalogue is the name of the database, the @schema and @table are the schema name (often simply ‘dbo’) and the table or view name, respectively.

After a little housekeeping (defining all the variables we’ll be using later) we output the head of the wiki table. This includes the instructions that it is to be sortable, and on which columns the sort is happening.

Then we open our first cursor. The table_cursor walks through all the columns in the table.

Inside the loop of the table_cursor, we create a constraint_cursor to get the constraints for each column by building up the @constraints variable with the details which will be put on the wiki table later.

After the that, a new cursor (pk_cursor) is set up to find the primary key end points of the foreign key column that is being rendered.

Finally, all the information that has been collected is output to the screen.

Once the table_cursor is over the mark-up for closing off the wiki table is output.

How it renders

The wiki mark-up looks like this:

{| class="wikitable sortable"
|-
! scope="col" | Column Name
! scope="col" | Ordinal Position
! scope="col" | Is Nullable
! scope="col" | Data type
! scope="col" | Constraints
! scope="col" | FK Endpoint
! scope="col" class="unsortable" | Notes
|-
| ProductID
| 1
| NO
| int
| PRIMARY KEY
|
|
|-

And it renders to the screen like this:

Mediawiki representation of the columns in a table

The script

The script can be found in full as a gist on github: https://gist.github.com/3035964.

Coming up

Next, I’ll be showing creating a list of inbound references to the table.

Creating a data dictionary with SQL Server and MediaWiki – Part one: a list of tables

Background

We’ve just installed MediaWiki as a documentation tool where I work. To get things up and running as quickly as possible, I created a few SQL Scripts to pull out schema information from the SQL Server in order to create a data dictionary. The SQL Scripts produce WikiMedia mark up that allows various things to be cross referenced in a way that makes navigating around the wiki fairly easy.

Getting a list of tables

The first thing to do is to get a list of tables. In the code below, the first line is the only one you’ll need to modify if you want to accept the output in its current format. The first line determines whether you are getting back a list of tables or a list of views.

SET NOCOUNT ON ensures that we don’t get any messages midway through saying things like “(504 row(s) affected)”

The first print statement simple outputs the header for the table in wiki markup. We are also marking certain columns on the table as sortable. That way the user can sort the table to the way they want it rather than the way it was initially rendered.

Then we head into the cursor (as we will be looping over each to the tables in turn outputting wiki mark up for each of them).

In each loop we are outputting various details about the tables. The mark up includes links to a wiki page that will contain more detail about the specific table. Although the rendered wiki table only shows the table name, the underlying link ensures a fully qualified name is used so that you don’t run into clashes between databases or schemas.

Finally, when the loop created by the cursor is complete we put in the closing markup for the wiki table.

There are a few more things going on, but I thought I’d just concentrate on the essentials and not go into a tutorial on how cursors work or anything like that

How it renders

The script renders Wiki Markup, which the MediaWiki engine renders as a table. Using the Adventure Works database as an example, this is what the end result looks like.

Data Dictionary showing a list of tables

The script

The script is now located in a github gist: https://gist.github.com/3035858

Coming up

In the next post, I’ll be showing the script for getting the column information out for each of the individual tables (or views).

Updates

  • This article was updated by adding in the code to generate sortable tables on Friday, 15/June/2012 at 22:00

Running Queries on Excel Spreadsheets using SQL Server 2008

I’m more a database person than a spreadsheet person. I’m more used to using SQL to bend data to my will than all the fancy gubbins that you’ll find in Excel. With some chunky (for a spreadsheet) ad hoc data in hand I set about connecting it up to SQL Server so I could run a few choice SELECT statements on the data.

The details in this post work with 64bit editions of Excel 2010 and SQL Server 2008 R2

The basic ad hoc connection looks something like this:

SELECT *
FROM OPENDATASOURCE( 'Microsoft.ACE.OLEDB.12.0', 'Data Source="<full file path to excel file>"; Extended properties=Excel 12.0')...[<data sheet name>$]

However, if you try that in SQL Server Management Studio on a raw SQL Server installation you’ll get this error message:

Msg 15281, Level 16, State 1, Line 7
SQL Server blocked access to STATEMENT 'OpenRowset/OpenDatasource' of component 'Ad Hoc Distributed Queries' because this component is turned off as part of the security configuration for this server. A system administrator can enable the use of 'Ad Hoc Distributed Queries' by using sp_configure. For more information about enabling 'Ad Hoc Distributed Queries', see "Surface Area Configuration" in SQL Server Books Online.

Enabling Ad Hoc Remote Queries

I’ve linked to the Books On-Line entry in the above, but it is only part of the story. Once you’ve followed its instructions on opening the View Facets dialog, you have to hunt around a little to find where you turn on and off the ad hoc remote queries. To save you the time, they’re in the “Server Configuration” facet.

The alternative, also mentioned, is to issue a SQL Command. This command:

sp_configure 'Ad Hoc Distributed Queries', 1;

However, that still won’t work directly. You’ll get the following error message:

Msg 15123, Level 16, State 1, Procedure sp_configure, Line 51

The configuration option 'Ad Hoc Distributed Queries' does not exist, or it may be an advanced option.

The full SQL you need is:

sp_configure 'show advanced options', 1;
RECONFIGURE
GO
sp_configure 'Ad Hoc Distributed Queries', 1;
RECONFIGURE
GO

And you’ll get output that looks like this:

Configuration option 'show advanced options' changed from 0 to 1. Run the RECONFIGURE statement to install.

Configuration option 'Ad Hoc Distributed Queries' changed from 0 to 1. Run the RECONFIGURE statement to install.

Now, we can try running the SELECT statement again, but this time the following error appears:

Msg 7302, Level 16, State 1, Line 2

Cannot create an instance of OLE DB provider "Microsoft.ACE.OLEDB.12.0" for linked server "(null)".

ODBC Configuration

This is because the ODBC driver is not configured correctly, go to the Control Panel –> System and Security –> Administrative Tools –> Data Sources (ODBC), alternatively you can just type “ODBC” in the Windows start bar.

Either way, you get to this dialog:

And as you can see the ODBC Driver needed for reading Excel files is not installed. A pretty big configuration failure. But it is easy enough to get the correct drivers. You can download them from Microsoft:

However, there is a problem if you have 32bit Office installed and 64bit SQL Server. The 32bit installer for the ODBC Drivers won’t work with 64bit SQL Server, and the 64bit drivers won’t install if it finds an existing 32bit installation of Office on the machine. For my desktop machine that was a problem, but luckily my laptop is running both 64bit versions of Office and SQL Server.

Finally

I eventually found this code snippet that works:

SELECT *
FROM OPENROWSET('MSDASQL',
'DRIVER=Microsoft Excel Driver (*.xls, *.xlsx, *.xlsm, *.xlsb);
DBQ=c:devPerformance-results.xlsx',
'SELECT * FROM [results$]')

The only issue that I have with this is that it uses MSDASQL which is surrounded in uncertainty. In one blog post it was said to be deprecated and 64bit versions won’t be available. Yet, there is a 64-bit version available for download. But for the ad hoc work I’m doing at the moment, it works.

Getting just the columns you want from Entity Framework

I’ve been looking at trying to optimise the data access in the project I’m working on, and the major stuff (like getting a piece of code that generated 6000 queries down to just 7) is now done. The next step is to look at smaller things that can still make savings.

At the moment, the queries return complete entities. However, that may not always be desirable. For example, I have an auto-complete feature that I just need an Id and some text data. The data comes from more than one table so at the moment I’m getting a rather large object graph back and throwing most of it away. It would be great to just retrieve the data we’re interested in and not have to waste time collating, transmitting, and mapping data we are then going to ignore.

So, using AdventureWorks as an example, here is some EF code to get the list of products we can use in an auto complete feature.

private IEnumerable<AutoCompleteData> GetAutoCompleteData(string searchTerm)
{
    using (var context = new AdventureWorksEntities())
    {
        var results = context.Products
            .Include("ProductSubcategory")
            .Where(p => p.Name.Contains(searchTerm)
                        && p.DiscontinuedDate == null)
            .AsEnumerable()
            .Select(p => new AutoCompleteData
                                {
                                    Id = p.ProductID,
                                    Text = BuildAutoCompleteText(p)
                                })
            .ToArray();
        return results;
    }
}

private static string BuildAutoCompleteText(Product p)
{
    string subcategoryName = string.Empty;
    if (p.ProductSubcategory != null)
        subcategoryName = p.ProductSubcategory.Name;

    return string.Format("{0} ({1}) @ ${2:0.00}", p.Name,
        subcategoryName, p.StandardCost);
}

This produces a call to SQL Server that looks like this:

exec sp_executesql N'SELECT
[Extent1].[ProductID] AS [ProductID],
[Extent1].[Name] AS [Name],
[Extent1].[ProductNumber] AS [ProductNumber],
[Extent1].[MakeFlag] AS [MakeFlag],
[Extent1].[FinishedGoodsFlag] AS [FinishedGoodsFlag],
[Extent1].[Color] AS [Color],
[Extent1].[SafetyStockLevel] AS [SafetyStockLevel],
[Extent1].[ReorderPoint] AS [ReorderPoint],
[Extent1].[StandardCost] AS [StandardCost],
[Extent1].[ListPrice] AS [ListPrice],
[Extent1].[Size] AS [Size],
[Extent1].[SizeUnitMeasureCode] AS [SizeUnitMeasureCode],
[Extent1].[WeightUnitMeasureCode] AS [WeightUnitMeasureCode],
[Extent1].[Weight] AS [Weight],
[Extent1].[DaysToManufacture] AS [DaysToManufacture],
[Extent1].[ProductLine] AS [ProductLine],
[Extent1].[Class] AS [Class],
[Extent1].[Style] AS [Style],
[Extent1].[ProductSubcategoryID] AS [ProductSubcategoryID],
[Extent1].[ProductModelID] AS [ProductModelID],
[Extent1].[SellStartDate] AS [SellStartDate],
[Extent1].[SellEndDate] AS [SellEndDate],
[Extent1].[DiscontinuedDate] AS [DiscontinuedDate],
[Extent1].[rowguid] AS [rowguid],
[Extent1].[ModifiedDate] AS [ModifiedDate],
[Extent2].[ProductSubcategoryID] AS [ProductSubcategoryID1],
[Extent2].[ProductCategoryID] AS [ProductCategoryID],
[Extent2].[Name] AS [Name1],
[Extent2].[rowguid] AS [rowguid1],
[Extent2].[ModifiedDate] AS [ModifiedDate1]
FROM  [Production].[Product] AS [Extent1]
LEFT OUTER JOIN [Production].[ProductSubcategory] AS [Extent2] ON [Extent1].[ProductSubcategoryID] = [Extent2].[ProductSubcategoryID]
WHERE ([Extent1].[Name] LIKE @p__linq__0 ESCAPE N''~'') AND ([Extent1].[DiscontinuedDate] IS NULL)',N'@p__linq__0 nvarchar(4000)',@p__linq__0=N'%silver%'

But as you can see from the C# code above, most of this is not needed. We are pulling back much more data than we need. It is even pulling back DiscontinuedDate which we already know must always be null.

What we can do is chain in a Select call that Entity Framework is happy with, that will give it the information it needs about the columns in the database that are actually being used.

So, why can’t it get the information it needs from the existing Select method?

Well, if I take away the AsEnumerable() call I’ll get an exception with a message that says “LINQ to Entities does not recognize the method ‘System.String BuildAutoCompleteText(DataAccess.EntityModel.Product)’ method, and this method cannot be translated into a store expression.

LINQ to Entities cannot understand this, so it has no idea that we are only using a fraction of the information it is bringing back. This brings us back to using a Select statement that LINQ to Entities is happy with. I’m going to use an anonymous type for that. The code then changes to this:

private IEnumerable<AutoCompleteData> GetAutoCompleteData(string searchTerm)
{
    using (var context = new AdventureWorksEntities())
    {
        var results = context.Products
            .Include("ProductSubcategory")
            .Where(p => p.Name.Contains(searchTerm)
                        && p.DiscontinuedDate == null)
            .Select(p => new
                            {
                                p.ProductID,
                                ProductSubcategoryName = p.ProductSubcategory.Name,
                                p.Name,
                                p.StandardCost
                            })
            .AsEnumerable()
            .Select(p => new AutoCompleteData
                                {
                                    Id = p.ProductID,
                                    Text = BuildAutoCompleteText(p.Name,
                                        p.ProductSubcategoryName, p.StandardCost)
                                })
            .ToArray();
        return results;
    }
}

private static string BuildAutoCompleteText(string name, string subcategoryName, decimal standardCost)
{
    return string.Format("{0} ({1}) @ ${2:0.00}", name, subcategoryName, standardCost);
}

[I’ve bolded the changes.]

What this is now able to do is to tell Entity Framework that we are only interested in just four columns, so when it generates the SQL code, that’s all it brings back:

exec sp_executesql N'SELECT
[Extent1].[ProductID] AS [ProductID],
[Extent2].[Name] AS [Name],
[Extent1].[Name] AS [Name1],
[Extent1].[StandardCost] AS [StandardCost]
FROM  [Production].[Product] AS [Extent1]
LEFT OUTER JOIN [Production].[ProductSubcategory] AS [Extent2] ON [Extent1].[ProductSubcategoryID] = [Extent2].[ProductSubcategoryID]
WHERE ([Extent1].[Name] LIKE @p__linq__0 ESCAPE N''~'') AND ([Extent1].[DiscontinuedDate] IS NULL)',N'@p__linq__0 nvarchar(4000)',@p__linq__0=N'%silver%'

Entity Framework query that never brings back data

I was recently optimising some data access code using the Entity Framework (EF) and I saw in the SQL Server Profiler this following emanating from the application:

SELECT
CAST(NULL AS varchar(1)) AS [C1],
CAST(NULL AS varchar(1)) AS [C2],
CAST(NULL AS varchar(1)) AS [C3],
CAST(NULL AS bit) AS [C4],
CAST(NULL AS varchar(1)) AS [C5],
CAST(NULL AS bit) AS [C6],
CAST(NULL AS varchar(1)) AS [C7],
CAST(NULL AS bit) AS [C8],
CAST(NULL AS bit) AS [C9]
FROM  ( SELECT 1 AS X ) AS [SingleRowTable1]
WHERE 1 = 0

At a glance it looks a little odd, but then the final line sealed it… WHERE 1 = 0!!! That will never return any rows whatsoever!

So what code caused this?

Here is an example using the AdventureWorks database:

private static IEnumerable<ProductCostHistory> GetPriceHistory(IEnumerable<int> productIDs)
{
    using (var products = new AdventureWorksEntities())
    {
        var result = products.ProductCostHistories
            .Where(pch => productIDs.Contains(pch.ProductID))
            .ToArray();
        return result;
    }
}

Called with code something like this:

int[] productIDs = new []{707, 708, 710, 711};
var history = GetPriceHistory(productIDs);

This will produce some SQL that looks like this:

SELECT
[Extent1].[ProductID] AS [ProductID],
[Extent1].[StartDate] AS [StartDate],
[Extent1].[EndDate] AS [EndDate],
[Extent1].[StandardCost] AS [StandardCost],
[Extent1].[ModifiedDate] AS [ModifiedDate]
FROM [Production].[ProductCostHistory] AS [Extent1]
WHERE [Extent1].[ProductID] IN (707,708,710,711)

So far, so good. The “where” clause contains a reasonable filter. But look what happens if the productsIDs arrays is empty.

SELECT
CAST(NULL AS int) AS [C1],
CAST(NULL AS datetime2) AS [C2],
CAST(NULL AS datetime2) AS [C3],
CAST(NULL AS decimal(19,4)) AS [C4],
CAST(NULL AS datetime2) AS [C5]
FROM  ( SELECT 1 AS X ) AS [SingleRowTable1]
WHERE 1 = 0

What a completely wasted roundtrip to the database.

Since we know that if the productIDs array is empty to start with then we won’t get any results back we can short circuit this and not run any code that calls the database if the input array is empty. For example:

private static IEnumerable<ProductCostHistory> GetPriceHistory(IEnumerable<int> productIDs)
{
    // Check to see if anything will be returned
    if (!productIDs.Any())
        return new ProductCostHistory[0];

    using (var products = new AdventureWorksEntities())
    {
        var result = products.ProductCostHistories
            .Where(pch => productIDs.Contains(pch.ProductID))
            .ToArray();
        return result;
    }
}

If you really must do dynamic SQL…

I may have mentioned in previous posts and articles about SQL Injection Attacks that dynamic SQL (building SQL commands by concatenating strings together) is a source of failure in the security of a data driven application. It becomes easy to inject malicious text in there to cause the system to return incorrect responses. Generally the solution is to use parameterised queries

However, there are times where you may have no choice. For example, if you want to dynamically reference tables or columns. You can’t do that as the table name or column name cannot be replaced with a parameter. You then have to use dynamic SQL and inject these into a SQL command.

The problem

It is possible for SQL Server to do that concatenation for you. For example:

CREATE PROCEDURE GetData
	@Id INT,
	@TableName sysname,
	@ColumnName sysname
AS
BEGIN
	SET NOCOUNT ON;

	DECLARE @sql nvarchar(max) =
		'SELECT ' + @ColumnName +
		' FROM ' + @TableName +
		' WHERE Id = '+cast(@Id as nvarchar(20));
	EXEC(@sql)
END
GO

This is a simple stored procedure that gets some data dynamically. However, even although everything is neatly parameterised it is no protection. All that has happened is that the location for vulnerability (i.e. the location of the construction of the SQL) has moved from the application into the database. The application is now parameterising everything, which is good. But there is more to consider than just that.

Validating the input

The next line of defence should be verifying that the table and column names passed are actually valid. In SQL Server you can query the INFORMATION_SCHEMA views to determine whether the column and tables exist.

If, for example, there is a table called MainTable in the database you can check it with a query like this:

SELECT * FROM INFORMATION_SCHEMA.TABLES
WHERE TABLE_NAME = 'MainTable'

And it will return:

INFORMATION_SCHEMA.TABLES

There is a similar view for checking columns. For example:

INFORMATION_SCHEMA.COLUMNS

As you can see, the INFORMATION_SCHEMA.COLUMNS view also contains sufficient detail on the table so that when we implement it we only have to make one check:

ALTER PROCEDURE GetData
	@Id INT,
	@TableName sysname,
	@ColumnName sysname
AS
BEGIN
    SET NOCOUNT ON;

    IF EXISTS (SELECT * FROM INFORMATION_SCHEMA.COLUMNS
               WHERE TABLE_NAME = @TableName AND COLUMN_NAME = @ColumnName)
    BEGIN
        DECLARE @sql nvarchar(max) =
            'SELECT ' + @ColumnName +
            ' FROM ' + @TableName +
            ' WHERE Id = '+cast(@Id as nvarchar(20));
        EXEC(@sql)
    END
END
GO

Formatting the input

The above is only part of the solution, it is perfectly possible for a table name to contain characters that mean it needs to be escaped. (e.g. a space character or the table may share a name with a SQL keyword). To escape a table or column name it is enclosed in square brackets, so a table name of My Table becomes [My Table] or a table called select becomes [select].

You can escape table and column names that wouldn’t ordinarily require escaping also. It makes no difference to them.

The code now becomes:

ALTER PROCEDURE GetData
	@Id INT,
	@TableName sysname,
	@ColumnName sysname
AS
BEGIN
    SET NOCOUNT ON;

    IF EXISTS (SELECT * FROM INFORMATION_SCHEMA.COLUMNS
               WHERE TABLE_NAME = @TableName AND COLUMN_NAME = @ColumnName)
    BEGIN
        DECLARE @sql nvarchar(max) =
            'SELECT [' + @ColumnName + '] ' +
            'FROM [' + @TableName + '] ' +
            'WHERE Id = '+cast(@Id as nvarchar(20));
        EXEC(@sql)
    END
END
GO

But that’s not quite the full story.

Really formatting the input

What if you have a table called Cra]zee Table? Okay – Why on earth would you have a table with such a stupid name? It happens, and it is a perfectly legitimate table name in SQL Server. People do weird stuff and you have to deal with it.

At the moment the current stored procedure will simply fall apart when presented with such input. The call to the stored procedure would look like this:

EXEC GetData 1, 'Cra]zee Table', 'MadStuff'

And it gets past the validation stage because it is a table in the system. The result is a message:

Msg 156, Level 15, State 1, Line 1
Incorrect syntax near the keyword 'Table'.

The SQL produced looks like this:

SELECT [MadStuff] FROM [Cra]zee Table] WHERE Id = 1

By this point is should be obvious why it failed. The SQL Parser interpreted the first closing square bracket as the terminator for the escaped section.

There are other special characters in SQL that require special consideration and you could write code to process them before adding it to the SQL string. In fact, I’ve seen many people do that. And more often than not they get it wrong.

The better way to deal with that sort of thing is to use a built in function in SQL Server called QUOTENAME. This will ensure the column or table name is properly escaped. The stored procedure we are now building now looks like this:

ALTER PROCEDURE GetData
	@Id INT,
	@TableName sysname,
	@ColumnName sysname
AS
BEGIN
    SET NOCOUNT ON;

    IF EXISTS (SELECT * FROM INFORMATION_SCHEMA.COLUMNS
               WHERE TABLE_NAME = @TableName AND COLUMN_NAME = @ColumnName)
    BEGIN
        DECLARE @sql nvarchar(max) =
            'SELECT ' + QUOTENAME(@ColumnName) +
            ' FROM ' + QUOTENAME(@TableName) +
            ' WHERE Id = '+cast(@Id as nvarchar(20));
        EXEC(@sql)
    END
END
GO

Things that can be parameterised

There is still something that can be done to this. The Id value is being injected in to the SQL string, yet it is something that can quite easily be parameterised.

The issue at the moment is that the SQL String is being executed by using the EXECUTE command. However, you cannot pass parameters into this sort of executed SQL. You need to use a stored procedure called sp_executesql. This allows parameters to be defined and passed into the dynamically created SQL.

The stored procedure now looks like this:

ALTER PROCEDURE GetData
	@Id INT,
	@TableName sysname,
	@ColumnName sysname
AS
BEGIN
    SET NOCOUNT ON;

    IF EXISTS (SELECT * FROM INFORMATION_SCHEMA.COLUMNS
               WHERE TABLE_NAME = @TableName AND COLUMN_NAME = @ColumnName)
    BEGIN
        DECLARE @sql nvarchar(max) =
            'SELECT ' + QUOTENAME(@ColumnName) +
            ' FROM ' + QUOTENAME(@TableName) +
            ' WHERE Id = @Identifier';
        EXEC sp_executesql @sql, N'@Identifier int',
                           @Identifier = @Id
    END
END
GO

This is not quite the end of the story. There are performance improvements that can be made when using sp_executesql. You can find out about these in the SQL Server books-online.

And finally…

If you must use dynamic SQL in stored procedures do take care to ensure that all the data is validated and cannot harm your database. This is an area in which I tread very carefully if I have no other choice.

Try and consider every conceivable input, especially inputs outside of the bounds of your application. Remember also, that defending your database is a multi-layered strategy. Even if you have the best firewalls and security procedures elsewhere in your system a determined hacker may find a way though your other defences and be communicating with the database in a way in which you didn’t anticipate. Assume that an attacker has got through your other defences, how do you provide the data services to your application(s) yet protect the database?

How to get a value from a text box into the database

This question was asked on a forum and I took some time to construct a reasonably lengthy reply so I’m copying it to my blog for a bit of permanence.

I suspect that many of my regular readers will be dismayed at the lack of proper architecture (e.g. layering) but we all had to start somewhere and I suspect that your first programs were not properly layered or structured either. I know mine certainly weren’t. My aim with this was to show how a simple goal can be achieved, what basic things are needed and how to fit it all together by doing the simplest thing that would work (a mantra from the agile world).

Here’s the post (slightly edited to put back some of the original context):

Okay – Let’s step back and show the whole thing from text box to database. NOTE: that this example shows everything in one place. This is generally considered poor practice, but as you are only just starting I’ll not burden you with the principles of layered architecture and the single responsibility principle and so on. (Just be aware they exist and one day you’ll have to learn about them)

So, let’s say you have a form with two text boxes, one for a name, and one for an age. Lets call them NameTB and AgeTB. The user can enter information in these text boxes and press a button that adds them to the database.

First, we need to get the data from the text boxes into a form we can use.

string name = NameTB.Text;
int age = Convert.ToInt32(AgeTB.Text);

Since text boxes only deal with strings we have to convert the string into a number (an Int32 – a 32bit integer) for the age value.

Now, we need to set up the database connection and command in order to insert this. I’ll assume you already have a connections string to your database, I’ve called it myConnectionString for this example.

SqlConnection myConnection = new SqlConnection(myConnectionString);
SqlCommand myCommand = new SqlCommand("INSERT Person(NameField, AgeField) "+
    "VALUES (@nameParam, @ageParam)", myConnection);

I’ve now set up the SQL Command with an insert statement. I’ve assumed there is a table called Person and it has two columns called NameField and AgeField. I’m also going to insert the values via parameters, which I’ve indicated with @nameParam and @ageParam. SQL Server requires that all parameter names start with an @ symbol. Other databases may vary.

myCommand.Parameters.AddWithValue("@nameParam", name);
myCommand.Parameters.AddWithValue("@ageParam", age);

We’ve now added the parameters into the SQL command and we’ve given each parameter the value we got earlier. Finally:

myConnection.Open();
myComment.ExecuteNonQuery();
myConnection.Close();

This opens the connection, runs the INSERT statement and closes the connection again. We’re using ExecuteNonQuery because we don’t expect any results back from SQL Server. If we were expecting data back (e.g. because we were using a SELECT statement) we could use ExecuteReader (for many rows/columns) or ExecuteScalar (for a single value).

This is a very basic example. I’ve not shown any error checking or exception handling. There is also the implicit assumption that all this code resides inside a button click event, which is considered poor practice for anything but a small or throw away application.

Follow

Get every new post delivered to your Inbox.