Optimising clustered indexes in SQL Server 2008

I’ve just found a script on another blog to go through all the clustered indexes in a SQL Server database and rebuild them in order to reduce fragmentation and improve the disk IO needed to get the data. The original script didn’t take into account tables in different schemas so I updated it. I also added some metrics to it so I could get a sense of how long the operation takes on each table.

The script can take quite a while to run. On my database it took over 6 minutes just to initially run the query to work out what needed rebuilding, and each index can take several seconds (or possibly more if you have a lot of data) on its own.

The new script is here:

SET NOCOUNT ON
DECLARE @Schema SYSNAME;
DECLARE @Table SYSNAME;
DECLARE @Index SYSNAME;
DECLARE @Rebuild NVARCHAR(4000)
DECLARE @StartTime DATETIME = GETUTCDATE();

PRINT (CONVERT(NVARCHAR(100), GETUTCDATE(), 113) + ' : Rebuild all indexes with over 10% fragmentation.')

DECLARE DB CURSOR FOR 
SELECT SS.name [schema], SO.Name [table], SI.Name [index]
FROM SYS.dm_db_index_physical_stats (DB_ID(), NULL, NULL , NULL, 'LIMITED')
INNER JOIN SYS.objects SO ON SO.object_id=SYS.dm_db_index_physical_stats.object_id
INNER JOIN SYS.schemas SS ON SO.schema_id=SS.schema_id
INNER JOIN SYS.indexes SI ON SI.index_id=SYS.dm_db_index_physical_stats.index_id AND 
SI.object_id=sys.dm_db_index_physical_stats.object_id 
-- Begin select only clustered indexes Index_id = 1
AND SI.Index_id = 1
-- End select only clustered indexes Index_id = 1
WHERE avg_fragmentation_in_percent > 10.0 AND SYS.dm_db_index_physical_stats.index_id > 0
ORDER BY SO.Name 

OPEN DB
FETCH NEXT FROM DB INTO @Schema, @Table, @Index
WHILE @@FETCH_STATUS = 0
BEGIN
  SET @Rebuild = 'ALTER INDEX ' + @Index + ' ON ' + @Schema + '.' + @Table + ' REBUILD'

  PRINT (CONVERT(NVARCHAR(100), GETUTCDATE(), 113) + ' : ' + @Rebuild)

  -- Comment out the following line to see what tables would be affected without rebuilding the indexes
  EXEC SP_EXECUTESQL @Rebuild

  FETCH NEXT FROM DB INTO @Schema, @Table, @Index
END
CLOSE DB
DEALLOCATE DB

DECLARE @Duration DATETIME = GETUTCDATE() - @StartTime;
PRINT (CONVERT(NVARCHAR(100), GETUTCDATE(), 113) + ' : Finished. Duration = '+CONVERT(NVARCHAR(100), @Duration, 114))

SET NOCOUNT OFF

Rewiring the users and logins in SQL Server

As a developer I find that I’m frequently backing up and restoring SQL Server databases between servers for development and testing purposes. However, each time I do the link between the login (a server concept) and the user (a per database concept) gets broken.

There is a stored procedure in SQL Server to wire it all back up again and I keep forgetting what it is. So here it is (my aide memoir):

sp_change_users_login: It maps an a database user to a SQL Server login.

The quick and easy way is as follows:

sp_change_users_login 'Update_One', 'myUserName', 'myLoginName'

 

If you really must do dynamic SQL…

I may have mentioned in previous posts and articles about SQL Injection Attacks that dynamic SQL (building SQL commands by concatenating strings together) is a source of failure in the security of a data driven application. It becomes easy to inject malicious text in there to cause the system to return incorrect responses. Generally the solution is to use parameterised queries

However, there are times where you may have no choice. For example, if you want to dynamically reference tables or columns. You can’t do that as the table name or column name cannot be replaced with a parameter. You then have to use dynamic SQL and inject these into a SQL command.

The problem

It is possible for SQL Server to do that concatenation for you. For example:

CREATE PROCEDURE GetData
	@Id INT,
	@TableName sysname,
	@ColumnName sysname
AS
BEGIN
	SET NOCOUNT ON;

	DECLARE @sql nvarchar(max) =
		'SELECT ' + @ColumnName +
		' FROM ' + @TableName +
		' WHERE Id = '+cast(@Id as nvarchar(20));
	EXEC(@sql)
END
GO

This is a simple stored procedure that gets some data dynamically. However, even although everything is neatly parameterised it is no protection. All that has happened is that the location for vulnerability (i.e. the location of the construction of the SQL) has moved from the application into the database. The application is now parameterising everything, which is good. But there is more to consider than just that.

Validating the input

The next line of defence should be verifying that the table and column names passed are actually valid. In SQL Server you can query the INFORMATION_SCHEMA views to determine whether the column and tables exist.

If, for example, there is a table called MainTable in the database you can check it with a query like this:

SELECT * FROM INFORMATION_SCHEMA.TABLES
WHERE TABLE_NAME = 'MainTable'

And it will return:

INFORMATION_SCHEMA.TABLES

There is a similar view for checking columns. For example:

INFORMATION_SCHEMA.COLUMNS

As you can see, the INFORMATION_SCHEMA.COLUMNS view also contains sufficient detail on the table so that when we implement it we only have to make one check:

ALTER PROCEDURE GetData
	@Id INT,
	@TableName sysname,
	@ColumnName sysname
AS
BEGIN
    SET NOCOUNT ON;

    IF EXISTS (SELECT * FROM INFORMATION_SCHEMA.COLUMNS
               WHERE TABLE_NAME = @TableName AND COLUMN_NAME = @ColumnName)
    BEGIN
        DECLARE @sql nvarchar(max) =
            'SELECT ' + @ColumnName +
            ' FROM ' + @TableName +
            ' WHERE Id = '+cast(@Id as nvarchar(20));
        EXEC(@sql)
    END
END
GO

Formatting the input

The above is only part of the solution, it is perfectly possible for a table name to contain characters that mean it needs to be escaped. (e.g. a space character or the table may share a name with a SQL keyword). To escape a table or column name it is enclosed in square brackets, so a table name of My Table becomes [My Table] or a table called select becomes [select].

You can escape table and column names that wouldn’t ordinarily require escaping also. It makes no difference to them.

The code now becomes:

ALTER PROCEDURE GetData
	@Id INT,
	@TableName sysname,
	@ColumnName sysname
AS
BEGIN
    SET NOCOUNT ON;

    IF EXISTS (SELECT * FROM INFORMATION_SCHEMA.COLUMNS
               WHERE TABLE_NAME = @TableName AND COLUMN_NAME = @ColumnName)
    BEGIN
        DECLARE @sql nvarchar(max) =
            'SELECT [' + @ColumnName + '] ' +
            'FROM [' + @TableName + '] ' +
            'WHERE Id = '+cast(@Id as nvarchar(20));
        EXEC(@sql)
    END
END
GO

But that’s not quite the full story.

Really formatting the input

What if you have a table called Cra]zee Table? Okay – Why on earth would you have a table with such a stupid name? It happens, and it is a perfectly legitimate table name in SQL Server. People do weird stuff and you have to deal with it.

At the moment the current stored procedure will simply fall apart when presented with such input. The call to the stored procedure would look like this:

EXEC GetData 1, 'Cra]zee Table', 'MadStuff'

And it gets past the validation stage because it is a table in the system. The result is a message:

Msg 156, Level 15, State 1, Line 1
Incorrect syntax near the keyword 'Table'.

The SQL produced looks like this:

SELECT [MadStuff] FROM [Cra]zee Table] WHERE Id = 1

By this point is should be obvious why it failed. The SQL Parser interpreted the first closing square bracket as the terminator for the escaped section.

There are other special characters in SQL that require special consideration and you could write code to process them before adding it to the SQL string. In fact, I’ve seen many people do that. And more often than not they get it wrong.

The better way to deal with that sort of thing is to use a built in function in SQL Server called QUOTENAME. This will ensure the column or table name is properly escaped. The stored procedure we are now building now looks like this:

ALTER PROCEDURE GetData
	@Id INT,
	@TableName sysname,
	@ColumnName sysname
AS
BEGIN
    SET NOCOUNT ON;

    IF EXISTS (SELECT * FROM INFORMATION_SCHEMA.COLUMNS
               WHERE TABLE_NAME = @TableName AND COLUMN_NAME = @ColumnName)
    BEGIN
        DECLARE @sql nvarchar(max) =
            'SELECT ' + QUOTENAME(@ColumnName) +
            ' FROM ' + QUOTENAME(@TableName) +
            ' WHERE Id = '+cast(@Id as nvarchar(20));
        EXEC(@sql)
    END
END
GO

Things that can be parameterised

There is still something that can be done to this. The Id value is being injected in to the SQL string, yet it is something that can quite easily be parameterised.

The issue at the moment is that the SQL String is being executed by using the EXECUTE command. However, you cannot pass parameters into this sort of executed SQL. You need to use a stored procedure called sp_executesql. This allows parameters to be defined and passed into the dynamically created SQL.

The stored procedure now looks like this:

ALTER PROCEDURE GetData
	@Id INT,
	@TableName sysname,
	@ColumnName sysname
AS
BEGIN
    SET NOCOUNT ON;

    IF EXISTS (SELECT * FROM INFORMATION_SCHEMA.COLUMNS
               WHERE TABLE_NAME = @TableName AND COLUMN_NAME = @ColumnName)
    BEGIN
        DECLARE @sql nvarchar(max) =
            'SELECT ' + QUOTENAME(@ColumnName) +
            ' FROM ' + QUOTENAME(@TableName) +
            ' WHERE Id = @Identifier';
        EXEC sp_executesql @sql, N'@Identifier int',
                           @Identifier = @Id
    END
END
GO

This is not quite the end of the story. There are performance improvements that can be made when using sp_executesql. You can find out about these in the SQL Server books-online.

And finally…

If you must use dynamic SQL in stored procedures do take care to ensure that all the data is validated and cannot harm your database. This is an area in which I tread very carefully if I have no other choice.

Try and consider every conceivable input, especially inputs outside of the bounds of your application. Remember also, that defending your database is a multi-layered strategy. Even if you have the best firewalls and security procedures elsewhere in your system a determined hacker may find a way though your other defences and be communicating with the database in a way in which you didn’t anticipate. Assume that an attacker has got through your other defences, how do you provide the data services to your application(s) yet protect the database?

VBUG Manchester – Spatial Queries

Just a reminder to anyone that is interested that I’m speaking tomorrow (Thursday July 24th) Evening at VBUG in Manchester on the topic of SQL Server 2008 Spatial Queries, slide decks are available on my website.

 

Tip of the Day #5 (SQL Server memory usage)

You can limit the amount of memory that SQL Server uses by using the sp_configure stored procedure. By limiting the amount of memory that SQL Server is permitted to use it means that more memory is available to other applications or other instances of SQL Server. In fact books on-line recommends setting the minimum and maximum memory used on each instance of SQL Server running on the same machine as SQL Server does not make any attempts to balance memory usage across instances.

In order to use this you must be in an advanced mode. To set this up use:

EXEC sp_configure 'show advanced options', 1
RECONFIGURE WITH OVERRIDE 

 

Next, to make the actual change you need the following:

EXEC sp_configure 'max server memory (MB)', 512
RECONFIGURE WITH OVERRIDE 

 

The above example will set the maximum amount of memory the server will use to 512MB. The RECONFIGURE WITH OVERRIDE is necessary in order for the change to take effect immediately. If it is missed out then the change won’t take place until the SQL Server is restarted.

If you want to check that the change has taken place you can use the following:

EXEC sp_configure 'max server memory (MB)' 

 

This will just display the current setting. You will get a result set that looks something like this:

SQL Server 2005 memory options result set

The congif_value is the value that the SQL Server is currently configured with. However, it may not be what is currently in force. The run_value shows you what is currently in force.

SQL Server 2005 memory options dialogIf you don’t want to type so much SQL yourself, then you can do the same in the SQL Server Management Studio. Right-click the server in the object explorer and select “properties” from the context menu. This will bring you up a dialog with all the server level properties in it. Go to the “memory” page and you can set the values that you want there. There are a couple of radio buttons that will allow you to switch between the currently configured value and the running value. By pressing Okay the updated value is applied to the server immediately.

For more information:

Technorati Tags: ,

Spatial Operations in SQL Server 2008 (Katmai) – Union and Convex Hull

CODE EXAMPLES IN THIS POST WORK WITH THE NOVEMBER 2007 CTP (CTP 5) OF SQL SERVER 2008.

Say you would like to create a polygon out of a group of points. One way of doing this is to union the points together then create a convex hull from those points. A convex hull is a polygon that contains all the points of the geometries that it is made from. “The convex hull may be easily visualized by imagining an elastic band stretched open to encompass the given object; when released, it will assume the shape of the required convex hull.” [Wikipedia:Convex Hull]

It is possible to create a convex hull from just two points, however in this case you will end up with a linestring rather than a polygon because a polygon requires a minimum of 3 points.

DECLARE @a geometry
DECLARE @b geometry

SELECT @a = geometry::STGeomFromText('POINT(0 0)',0),
       @b = geometry::STGeomFromText('POINT(10 10)', 0);

SELECT @a.STUnion(@b).STConvexHull().ToString();

Results in: LINESTRING (10 10, 0 0)

With an additional point a polygon can be created.

DECLARE @a geometry
DECLARE @b geometry
DECLARE @c geometry

SELECT @a = geometry::STGeomFromText('POINT(0 0)',0),
       @b = geometry::STGeomFromText('POINT(10 10)', 0),
       @c = geometry::STGeomFromText('POINT(20 0)', 0);

SELECT @a.STUnion(@b).STUnion(@c).STConvexHull().ToString();

Results in: POLYGON ((20 0, 10 10, 0 0, 20 0))

What you’ll notice is that the polygon has 4 points, but we only gave 3 to start with. That is because the first and last point in the polygon are the same.

If you were to look at the geometry that had been created with just the union operations before the convex hull was made then you’ll see it is a MultiPoint: MULTIPOINT ((10 10), (20 0), (0 0))

graph1Unioning different types of geometry together, such as a point, linestring and polygon (see figure on the right) will, if the geometries don’t overlap, result in a GeometryCollection. For instance the code:

DECLARE @a geometry
DECLARE @b geometry
DECLARE @c geometry

SELECT @a = geometry::STGeomFromText(
            'POLYGON ((25 5, 15 15, 5 5, 25 5))',0),
       @b = geometry::STGeomFromText(
            'POINT(5 10)', 0),
       @c = geometry::STGeomFromText(
            'LINESTRING(20 20, 30 5)', 0);

SELECT  @a.STUnion(@b).STUnion(@c).ToString();

 

Will result in the following: GEOMETRYCOLLECTION (POINT (5 10), LINESTRING (20 20, 30 5), POLYGON ((5 5, 25 5, 15 15, 5 5)))

Moving the point to a position within the polygon, such as POINT(15 10) will result in a geometry collection that does not contain a separate point. As the point is within the boundary of the polygon it does not need to be separately listed in the geometry collection. The actual geometry looks like this: GEOMETRYCOLLECTION (LINESTRING (20 20, 30 5), POLYGON ((5 5, 25 5, 15 15, 5 5)))

graph2Moving the linestring to travel from 5,10 to 30,10 (through the polygon) results in a geometry collection with two linestrings (see figure on the left). One that runs from 5,10 to the boundary of the polygon at 10,10 and the second that runs from the  boundary of the polygon at 20,10 to the original end point at 30,10. The resulting MultiGeometry looks like this: GEOMETRYCOLLECTION (LINESTRING (30 10, 20 10), LINESTRING (10 10, 5 10), POLYGON ((5 5, 25 5, 20 10, 15 15, 10 10, 5 5)))

DECLARE @a geometry
DECLARE @b geometry
DECLARE @c geometry

SELECT @a = geometry::STGeomFromText(
            'POLYGON ((25 5, 15 15, 5 5, 25 5))',0),
       @b = geometry::STGeomFromText(
            'POINT(15 10)', 0),
       @c = geometry::STGeomFromText(
            'LINESTRING(5 10, 30 10)', 0);

SELECT @a.STUnion(@b).STUnion(@c).ToString();

Other posts in this series:

Contradictory messages

While attempting to create a database project in Visual Studio 2008 against a SQL Server 2008 database I got a rather odd error message. The dialog used to create the project requests information about the SQL Server database. It clearly states “The server version must be 2005 or later”. No problem, I thought. So I put in the information about my SQL Server and database in the dialog and tested the connection. So far so good. But as soon as I hit the “Okay” button I got a new message. Apparently, “Only servers up to Microsoft SQL Server 2005 are supported.”

Contradictions

Inserting geometry through a .NET Application

THIS POST REFERS TO THE NOVEMBER 2007 CTP (CTP 5) OF SQL SERVER 2008

Following from my previous posts (Getting started with Spatial Data in SQL Server 2008, Spatial Data in a .NET application) on the new spatial features of SQL Server 2008 I’ve been looking at how to get spatial data into SQL Server from a .NET application. This has not been as easy as expected.

I suspect I just have not found the right piece of documentation because my eventual solution isn’t one I’m entirely happy with.

I was unable to add a SqlGeometry as a parameter to the collection of parameters on the SqlCommand object in my .NET application. The SqlGeometry does not appear in the enumeration for SQL Server data types. My first thought was to put the data through as binary as I vaguely remember reading something about using binary in something I read recently, but I couldn’t quite remember where. However, when I created the geometry object in .NET then used .STAsBinary() on it so the parameter object would accept it the INSERT failed. The exception message was:

A .NET Framework error occurred during execution of user defined routine or aggregate 'geometry':
System.FormatException: One of the identified items was in an invalid format.
System.FormatException:
   at Microsoft.SqlServer.Types.GeometryData.Read(BinaryReader r)
   at Microsoft.SqlServer.Types.SqlGeometry.Read(BinaryReader r)
   at SqlGeometry::.DeserializeValidate(IntPtr , Int32 )
.
The statement has been terminated.

 

Yes, that was the MESSAGE from the exception. The stack trace above comes from within SQL Server itself. There is a separate stack track for my application. (I’m guessing if you are familiar with CLR stored procedures you may have seen this sort of thing before. I’ve not used the CLR code in SQL Server before, so it is a new experience)

The solution I found was to mark the parameter as an NVarChar, skip the creation of an SqlGeometry object and use a string containing the WKT (Well Known Text) representation. For example:

cmd.Parameters.Add("@Point", SqlDbType.NVarChar) = "POINT (312500 791500)";

I mentioned earlier that I’m not all that happy with this solution. That is because if I already have an SqlGeometry object I don’t want to have to convert it to a human readable format and have SQL Server parse it back again. Human Readable formats tend not to be the most efficient way of representing data and the geometry type already has a more efficient (as I understand it) format. Although in this case I bypassed the creation of an SqlGeometry object in my code, I can foresee cases where I might have created an SqlGeometry object through various operations and would have produced a fairly complex object. In those cases I wouldn’t want the additional overhead formatting it into WKT, passing it over the network and parsing it on the other side.

If I find a better solution I will post it.

Getting started with Spatial Data in SQL Server 2008

THIS POST REFERS TO THE NOVEMBER 2007 CTP (CTP 5) OF SQL SERVER 2008

This post is probably going to be a wee bit random. After the running around over the last couple of weeks with the MSDN event (Sharepoint for Developers) in Edinburgh, trying to get the Developer Day Scotland website up, an invite to a VBUG event in Reading, the Community Leaders Day at the IoD, DDD6 and the TechNet event of Andrew Fryer‘s 8 Reasons to migrate to SQL Server 2008 I’ve finally got round to trying CTP5 of SQL Server 2008. I actually installed it in a virtual machine within hours of it becoming available for download, but it is only now I’m getting round to trying it out.

First off lets start with the way spatial data is held in SQL Server. There are two spatial types, geometry and geography. Although they sounds very close there is a fair difference between then and it is probably best not to confuse them. However, both types are very well named.

Geometry is a simple two-dimensional grid with X,Y coordinates. The British National Grid is an example of this. I would guess geometry would be most useful in systems where data comes in a specific planar/flat-earth projection, or where mapping of small areas (such as the internals of a building) are needed. Lengths and areas for geometry are easy to work out. The coordinates will have a unit of measurement attached, for instance the National Grid in the UK is in metres, so the distance between any two points can be worked out by simple Pythagorean maths and the result returned in the same unit of measurement as the coordinates.

Geography fits the spatial data on a sphere with lat/long coordinates. This is a better choice for international data or for countries where the land mass is simply too big to fit in one planar projection. However, it is important to realise that lat/long is still projected. There are various schemes for fitting a lat/long position to a place on the earth. It is important to know which is being used otherwise data from different sources may not match up. It is not so simple to calculate distances and areas on a geography type as the distance between two coordinates changes depending on where those coordinates are. For example, a line that is 5º from east to west is smaller the closer to the pole it gets with the largest distance at the equator.

According to the documentation geography also has some other limitations. No geography object may be greater in size that a hemisphere. When loading data into the database it will generate an ArgumentException if tried, and if the result of an spatial operation results in a geography greater than a single hemisphere then the result will be null.

Finally, before getting on with some code, a note on SRID (Spatial Reference Identifier). Each piece of spatial data must be tagged with an SRID. Geometry types can have a SRID of 0 (which means undefined) but geographies must have a defined SRID. By default geography types use an SRID of 4326 which equates to WGS84. Spatial operations can only occur between spatial types with the same SRID. The result of spatial operations between two pieces of data with different SRIDs is null.

With that brief introduction to geometry and geography how do you create data in the database.

CREATE TABLE Town
(
    TownId int NOT NULL,
    Name nvarchar(256),
    TownGeom geometry)

To populate the column there are a number of ways of getting the data in. Currently SQL Server supports WKT (Well Known Text), WKB (Well Known Binary) and GML (Geographic Markup Language). For other data types converters will need to be written. The following example shows WKT:

INSERT Town
VALUES(1, 'Pitcardine',
       geometry::STGeomFromText(
       'POLYGON ((0 0, 370 0, 370 160, 200 250, 0 250, 0 0))', 0));

It is also possible to use the more precise method STPolyFromText. Naturally the parser will be more strict about what WKT it accepts when using the more specialised methods. For example if the WKT for a line string is sent to the STPolyFromText method the error would look like this:

Msg 6522, Level 16, State 1, Line 2
A .NET Framework error occurred during execution of user defined routine or aggregate 'geometry':
System.FormatException: 24142: Expected POLYGON at position 1. The input has LINESTR.
System.FormatException:
   at Microsoft.SqlServer.Types.OpenGisWktReader.RecognizeToken(String token)
   at Microsoft.SqlServer.Types.OpenGisWktReader.ParsePolygonTaggedText()
   at Microsoft.SqlServer.Types.OpenGisWktReader.ReadPolygon()
   at Microsoft.SqlServer.Types.SqlGeometry.STPolyFromText(SqlChars polygonTaggedText, Int32 srid)
.

At present it doesn’t seem to be possible to return the column as GML as the method isn’t found. The documentation for the method doesn’t work either so I suspect it is a feature that isn’t ready yet.

SQL Server 2008 (Katmai): I spoke to soon

Looks like I spoke too soon. I was all resigned to go through and install Katmai (SQL Server 2008) in a new virtual machine, when it occurred to me that I could just uninstall SQL Server 2005 Express Edition as I wasn’t using it.

I now have access to the SQL Server Management Studio!

Tags:

Follow

Get every new post delivered to your Inbox.