By SQL Server MVP Ami Levin
Recently, a user on the Hebrew SQL Server MSDN Forum posted a design question. Here it is, literally translated from Hebrew:
Figure 1 - Original Design Schema
My question is whether this can be enforced only when inserting a value, or is it possible to enforce this rule via the schema definition.
A few people replied and suggested various solutions, among those you can find (again literally translated from Hebrew):
I’ve decided to use this opportunity to try and shed a little light on some very interesting issues this question and the suggested solutions have brought up.
First, I would like to say that all solutions for enforcing the requirement to have only one default answer per question that are based on adding business logic rules as DML in the form of triggers, UDF in check constraints, filtered indexes, indexed views and the likes may work very well. However, in my humble opinion, using such solutions is essentially covering up the root issue here which is the data model. Therefore, I’m not going to discuss those here. You can use similar logic in the application or your DAL. Instead, I want to explore the option of enforcing this rule via the data model. Of course, one might argue that this is a business rule and should be imposed as one. I can’t say it’s a wrong claim. I think we simply don’t have enough information in this case to decide one way or the other but the theoretical aspects here are fascinating, so let’s take the ride.
First, let’s look at the original design (only relevant columns included):
Figure 2 - Original Design Code
This might initially seem to be the natural design choice. Intuitively, every question has a few answers so a one-to-many, parent-child relationship seems to be in place. The choice of using ID columns as primary keys and the BIT column (instead of a BOOLEAN which does not exist in SQL Server - yet…) with the values 1 / 0 to indicate true or false is also a common practice. However, there is no obvious way to enforce the “Every question can have only one default answer” rule. Or is there?
To start at the end – there is a very simple, straight forward solution to enforce this rule with the existing schema design. This solution is fully ANSI standard compliant but unfortunately, it is not supported by SQL Server. In ANSI SQL, a UNIQUE column may contain multiple NULL values since these are (rightfully) not considered equal. However, SQL Server allows only one NULL. In Oracle for example, you could add a unique constraint to QuestionID and IsDefault, and use NULL values to indicate false default. This way, only 1 row per QuestionID may have a non-null value. You could also just use BOOLEAN type.
Figure 3 - Simple Solution, But Not for SQL Server
Is this the correct solution if it worked for us? Perhaps it is, but today I would like to explore some alternative paths. Let’s start with the solutions suggested by the form contributors.
One solution suggested representing the default answer as a property of the question. This means adding a DefaultAnswer column to the Question table, pointing to the ID of the default answer. This solution also enables us to enforce the extension to the rule “There must be one and only one default answer to each question” by adding a NOT NULL constraint to the DefaultAnswer column. Let’s try this and see what we come up with:
Figure 4 - Cyclic Solution
Try to execute the above script, and you will receive the following errors:
Msg 1767, Level 16, State 0, Line 1Foreign key 'FK__Question__Defaul__47DBAE45' references invalid table 'Answer'.Msg 1750, Level 16, State 0, Line 1Could not create constraint. See previous errors.Msg 1767, Level 16, State 0, Line 2Foreign key 'FK__Answer__Question__4AB81AF0' references invalid table 'Question'.Msg 1750, Level 16, State 0, Line 2Could not create constraint. See previous errors.
What’s going on here? We have a cyclic reference on our hands… A points to B but B points right back at A. Well, we can always defer the creation of one of these conflicting constraints until after the tables are created, right? Let’s try and see.
Figure 5 - Cyclic Solution Revised
Yep. This time it works. But now try to insert some rows and see how far you get… Good luck J.
Cyclic references are (in most cases) evil and very tricky to handle. They prevent you from using cascading referential constraints, they make it a challenge to maintain the schema and even a simple data modification can become a real challenge. Try to avoid them whenever you can. In many cases, they are an indication of a design error and you should at least look for some better design alternatives. I believe this is one of those cases.
Two of the other proposed solutions involve adding an additional table to the schema. This means representing DefaultAnswer as an entity by its own right instead of being a property of an Answer. To my great surprise, the ‘tricky’ part which most repliers didn’t suggest is using a composite foreign key to enforce the right combination of ‘question – answer’ as the default one. Two such (wrong) designs were suggested:
Figure 6 - Defualt Answer as an Entity Solution
The second suggested solution above does not work of course. In order to be able to point to the Answer table, the QuestionID needs to be unique which is logically impossible (unless you have only one answer per question). Funnily enough, the error message it returns complains about an invalid table and not the column:
Msg 1767, Level 16, State 0, Line 2Foreign key 'FK__DefaultAn__Quest__619B8048' references invalid table 'Answer'.Msg 1750, Level 16, State 0, Line 2Could not create constraint. See previous errors.
In both suggested solutions, nothing prevents the application from associating a default answer to a question which is not even one of the question’s possible answers. For example, the default answer for question 1 can be made to point to one of the answers that belong to question 2. This is actually very easy to solve and I was surprised that no one thought of making the constraint of the second solution above a composite foreign key on (Question, Answer) pointing to the Answer table which would have prevented this. I believe it has to do with the fact that using ID’s as primary keys make “a one-column foreign key” so deeply engrained in people’s minds that they overlook such simple solutions.
I would now like to cover some potential solutions based on the above suggestions, but I would also like to take the opportunity to improve on the original design. First, the naming convention I like to use for naming tables is the plural form of the entity it represents. It’s Questions and not Question unless you plan to hold only one row in that table. There is some controversy associated with this issue. Although I advocate using the plural form, the technical editor of this article, Mr. Louis Davidson, is an advocate of using the singular form. With your permission, I will use the plural form in my suggested solution (expect for one place, for a very particular reason, see if you can spot where and why…). Second, I would like to suggest an alternative approach to the common practice of using artificial keys, or “IDs” as they are called, for every table. Even if in this case using the actual question and answer as the keys – yes, long string keys – seems odd, try to see how it makes everything fit together nicely and also reveals some ‘shocking’ details about the data model itself. Bear with me on this. In any case, all the claims I’m going to make are valid whether you use surrogate keys (IDs) or natural ones. Watch carefully how the key selection actually affects the design.
First, let’s see the corrected version of above suggested solution which uses a separate table to hold the default answers.
Figure 7 - Default Answer as an Entity Solution (Corrected)
There are two things that come to mind when looking at the above design. The unique constraint on the Answers table seems to be (and is…) redundant. Since Answer is our primary key, why should I declare that the combination of (Answer, Question) be unique too? Of course it is. Here it’s needed to allow creating the foreign key from the Default_Answers table. The first alarm bell should start ringing in your head - something may be wrong with the design here…
Moreover, using the actual Answer column as the primary key of the Answers table means you will not be able to use the same answer for several questions. For all your questions, there will be only one “Yes” answer, one “All of the above”, one”32”… you get the idea. Is this a design flaw? Surely I want to be able to have more than one question with an answer of “Yes” or “No”. Well, that is a question you should ask yourself when modeling your data. The Answers table holds (surprise, surprise) answers. If that is the entity it represents, your answer should be identifiable by its nature as an answer. Adding an ID column which allows for inserting multiple rows with the same AnswerText is just avoiding the question: “What does this table model?” If it’s answers, there can be only one row per answer as is the case above. If you are thinking of changing the primary key to (Answer, Question), you are no longer modeling your Answers. You are modeling the relationship between Answers and Questions which is… your Questionnaire. The solution below is an improvement to the previous model, realizing that Questions and Answers are distinct entities, with a many-to-many relationship.
Figure 8 - Questionnaire (Many to Many) Solution
Now it seems a bit better. I can assign the same answer to multiple questions. I no longer have redundant constraints as the column pair (Question, Answer) are naturally used as primary keys and the referencing entity Default_Answers naturally points to it without any need for redundant objects. It’s nice when things start falling into place…
However, note that above design assumes there is only one questionnaire. This was the original design limitation as well. If you plan ahead and see that in the future you may need to have multiple questionnaires, you would then do the following (see suggested DDL code for this solution in the attached script file):
Here is a suggested solution that addresses these potential requirements:
Figure 9 - Multiple Questionnaires Solution
Of course, there is no right or wrong design without a full specification to validate the data model. I tried to use common sense to come up with a working, flexible and solid solution. In many cases, I challenge the design specs when I spot a potential issue and many times I managed to make the ones who wrote it realize their potential flaws.
Do you ever consider such alternatives when you approach your design tasks?
Article was written by Ami Levin and technically edited by Louis Davidson.
Ami Levin is a Microsoft SQL Server MVP, with over 20 years of experience in the IT industry. For the past 12 years he has been consulting, teaching and speaking on SQL Server worldwide. He manages the Israeli SQL Server user group, leads the local support forum, and is a regular speaker at Microsoft conferences.
Louis Davidson (Dr. SQL) has over 15 years in information technology, nearly all of it spent designing, building, implementing, and coding using database technologies, primarily SQL Server. He has written 3 books on the subject of database design, including Pro SQL Server 2005 Database Design and Optimization and is working on the fourth generation of that very same title today.