Table names – Singular or Plural

Earlier this morning I tweeted asking for a quick idea of whether to go with singular table names or plural table names. i.e. the difference between having a table called “Country” vs. “Countries”

Here are the very close results:

  • Singular: 8
  • Plural: 6
  • Either: 1

Why Singuar:

  • That’s how the start off life on my ER diagram
  • You don’t need to use a plural name to know a table will hold many of an item.
  • A table consists of rows of items that are singular

Why Plural:

  • It is the only choice unless you are only ever storing one row in each table.
  • because they contain multiple items
  • It contains Users
  • I think of it as a collection rather than a type/class
  • SELECT TOP 1 * FROM Customers

Why either:

  • Either works, so long as it is consistent across the entire db/app

Creating Many-to-Many joins

A topic that comes up from time to time in forums is how to join two tables together when there is a many-to-many relationship. Typical examples include teachers-to-students or articles-to-tags (to create a “tag cloud”)

If you have made any relationships in a database you will see that it is very easy to create a one-to-many join. For example, a web forum may have many posts, but a post only belongs in one forum.

To create a many-to-many relationship you need to create an intermediate table. This is a table that each side of the many-to-many can have a one-to-many relationship with. The following diagram shows the many-to-many relationship between a blog posts and the tags on it (This is not a full model, just enough to show the relationship)

Many-to-many

The BlogPost has its primary key (BlogPostId), as does the Tag (TagId). Normally you would see that key being used as the foreign key in the other table, however that wouldn’t work with a many-to-many relationship.

In order to join the two tables together an “intermediate table” needs to be created that just contains the two primary keys from either side of the relationship. Those two foreign keys make up a compound* primary key in the intermediate table.

It is normal to name the intermediate table after the each table that forms the relationship. In this case it would be “BlogPostTag” after BlogPost and Tag.

In order to join a row in the BlogPost table to a row in the Tag table you only need to insert a new row in the BlogPostTag table with the keys from either side. e.g.

INSERT BlogPostTag VALUES(@blogPostId, @tagId);

In order to remove the relationship between a blog post and a tag you only need to delete the row from the intermediate table. e.g.

DELETE BlogPostTag WHERE BlogPostId = @blogPostId AND TagId = @tagId;

 

 

* a “compound key” is one which is made up of more than one column.

 

Normalising the data model

Sometimes I see on forums someone who is trying to get some SQL statement to wield data in a particular way but the data model is just thwarting their attempts, or if they do get something to work the SQL statement that does the job is horrendously complex. This tends to happen because the data is not normalised (or “normalized” if you are using the American spelling) to the third normal form. Denormalised data models tend to happen for two reasons. Firstly, because the modeller is inexperienced and does not realise the faux pas they have made in the model, and, secondly, because the modeller has found the a properly normalised data model just doesn’t have the performance needed to do the job required.

The Scenario

In the example scenario that I am going to present an private education company is looking to build a system that helps track their tutors and students. So as not to be overwhelming I am only going to concentrate on one aspect of the system – the tutor. A tutor may be multilingual and can teach in a variety of languages and they may also be able to teach a number of subjects. The Tutor table has joins to a table for languages and a table for the subjects. The model looks like this:

The denormalised data model
Partially denormalised data model

As you can see there are 3 joins from Tutors to Languages and 4 joins from Subjects to Tutors. This makes joins between these tables particularly complex. For example, to find out the languages that a tutor speaks then a query like this would have to be formed.

SELECT  l1.Name AS LanguageName1,
        l2.Name as LanguageName2,
        l3.Name as LanguageName3
FROM Tutors AS t
LEFT OUTER JOIN Languages AS l1 ON l1.LanguageID = t.Language1
LEFT OUTER JOIN Languages AS l2 ON l2.LanguageID = t.Language2
LEFT OUTER JOIN Languages AS l3 ON l3.LanguageID = t.Language3
WHERE t.TutorID = @TutorID

So, what happens if the tutor is fluent in more than three languages? Either the system cannot accept the fourth language it will have to be changed to accommodate it. If the latter option is chosen imagine the amount of work needed to make that change.

A similar situation occurs with the join to the Subjects table.

Solution

A better way to handle this sort of situation is with a many-to-many join. Many database systems cannot directly create a many-to-many join between two tables and must create an intermediate table. For those database systems that appear to be able to model a many-to-many join directly (GE-Smallworld comes to mind) what is actually happening is that an intermediate table is being created in the background that isn’t normally visible and the database takes care of this automatically.

The resulting data model will look like this

The normalised data model
Normalised data model

This allows a tutor to be able to register any number of languages or subjects. It also means that any joins on the data are easier as there are no duplicate joins for each Language or Subject. The above SELECT statement can be rewritten as:

SELECT  l.Name AS LanguageName
FROM Tutors AS t
INNER JOIN TutorLanguage as tl ON tl.TutorID = t.TutorID
INNER JOIN Languages as l ON tl.LanguageID = l.LanguageID
WHERE t.TutorID = @TutorID

This will result in one row being returned for each language rather than all the languages being returned into one row. It is possible to pivot the results back to one row, but currently in SQL Server 2000 that would add more complexity to the query than I am willing to discuss in this article. If you want to know how to pivot results in SQL Server 2000 then see the page on Cross-Tab Reports in the SQL Server books-online. SQL Server 2005 will allow PIVOTed results directly. For more information between the SQL Server 2000 and 2005 way of doing things see: Pivot (or Unpivot) Your Data – Windows IT Pro

Migrating existing data

Naturally, if you have existing data using the denormalised schema and you want to migrate it to the normalised schema then you will need to be careful about the order in which changes are made lest you lose your data.

  1. Create the intermediate table.
  2. Change any stored procedures using the denormalised schema to the normalised schema.
    • You may also need to change code outside the database. If you find yourself needing to do this then I strongly recommend that you read about the benefits of stored procedures.
  3. Perform an insert for each of the denormalised joins into the intermediate table
  4. Remove the old joins.

If possible the above should be scripted so that the database changes occur as quickly as possible as, depending on your situation, you may have to take your production system off-line while making the change. Testing the changes in a development environment first should ensure that the scripts are written well and don’t fall over when being run on the production database.

To move the denormalised Language joins to the normalised schema some SQL like this can be used.

INSERT INTO TutorLanguage
    SELECT TutorID, Language1 AS LanguageID
    FROM Tutors
    WHERE Language1 IS NOT NULL
UNION
    SELECT TutorID, Language2 AS LanguageID
    FROM Tutors
    WHERE Language2 IS NOT NULL
UNION
    SELECT TutorID, Language3 AS LanguageID
    FROM Tutors
    WHERE Language3 IS NOT NULL

It can, of course, be written as a series of individual INSERT INTO…SELECT statements rather that a large UNIONed SELECT

NOTE: This was rescued from the Google Cache. The original date was Sunday 3rd April 2005.

Tags:

The Normal Forms of Database Normalization

I’m doing a clear out at the moment as I’m getting the house ready to sell – I’m planning to move to West Lothian – and I came across some old notes about database normalisation. It was a summary about the first 5 normal forms so I thought I’d share them.

First Normal Form: Every column contains just one value. That means that if you have a person’s name it should be split up to forename and surname, an address is split up so that the city and postcode are in separate columns, and so on.

Second Normal Form: The First Normal Form plus every table has a primary key. That means that everything can be uniquely referenced through the primary key.

Third Normal Form: The Second Normal Form plus every non-key column depends on the primary key. So for example, if you have an Orders table then columns such as DateOfOrder and DateOfDispatch are valid but listing the client’s details such as name, address and so on are not valid.

Fourth Normal Form: Remove repeating columns to a new table. Repeating columns are indicative of a missing relationship. So, if you have an Orders table it will not have item1, item2, and item3 columns. They will be removed to a separate table and form part of a join to the orders table.

Fifth Normal Form: Remove repetition within columns. Simply, this is enumerated values. For example, a company may accept Visa, Mastercard and American Express for payment. Rather than repeat those strings in the order payment table, they can be places in a small lookup table and the order payment table can refer to their smaller primary key.

However, I should point out that my notes are at odds with other references that I found. It would seem that the correct Fifth Normal Form is to do with ternary many-to-many relationships. In fact a quick search on the internet found that what ever I had written down as the Fifth Normal Form isn’t to be found. More research is in order to find out where my idea of the Fifth Normal Form comes from

NOTE: This was rescued from the Google Cache. The original date was Saturday, 22nd April 2006.

Delete the row, or set a deleted flag?

For auditing purposes some database system run without deleting rows from the database – they just flag the row as being deleted and ensure that queries that run on the table add a line to the WHERE clause to filter out the “deleted” rows.

However, as Daniel Turini points out on Code Project, this can hurt the database performance.

Specifically he says:

Bear in mind that having a flag leads to bad index performance. In my (rather old) SQL Server DO’s and DONT’s[^], I suggest you not to index on the “sex” or “gender” field:

First, let’s understand how indexes speed up table access. You can see indexes as a way of quickly partitioning a table based on a criteria. If you create an index with a column like “Sex”, you’ll have only two partitions: Male and Female. What optimization will you have on a table with 1,000,000 rows? Remember, mantaining an index is slow. Always design your indexes with the most sparse columns first and the least sparse columns last, e.g, Name + Province + Sex.

Creating a “deleted” flag will lead to bad performance, on several common cases. What I’d suggest is to move those rows to a “Deleted” table, and have a view using UNION ALL so you can easily access all the data for auditing purposes.

Not to mention that it’s easy to forget an “AND NOT IsDeleted” in some of your queries.

However, others recon the performance hit is negligable. Or that moving the data to a new table could damage the data. It is an interesting debate and you can read it here, from the start.

NOTE: This was rescued from the Google Cache. The original date was Thursday, 29th June 2006.

Tags:

Follow

Get every new post delivered to your Inbox.

Join 26 other followers