Good Habits To Adopt: Enforcing the natural primary key

I’ve been reading Aaron Bertrand’s great series of blog posts on bad habits to kick, and have been thinking to myself: what are some good habits that SQL Server developers should implement?    I spend most of my day griping about bad design from vendors, yet I hardly ever take the time to document what should be done instead.  This post is my first attempt to do so, and it’s based on the following assumptions:

  • Good habits are going to be a lot more controversial than bad habits, and
  • SQL Server doesn’t enforce many of these good habits for you.

The first point refers to the fact that some of the choices that I make are not necessarily the best way to do things, and they may not satisfy the need of every application.  I’m a firm believer that there is an exception to every rule, but my goal is to at least define what the rules are (and again, these rules are my own creation and someone may have better rules).  The second point refers to the fact that SQL Server enforces the rules of SQL, but leaves some of that enforcement open to interpretation.  For example, the relational model defined by SQL assumes that tables are related, but SQL Server doesn’t require that you define a FOREIGN KEY (or even a PRIMARY KEY).

So here’s my first good habit:

When defining a surrogate primary key for a table, you should enforce the natural primary key with the use of a UNIQUE constraint.

To really understand this, you have to start with defining what a surrogate primary key is versus a natural primary key.  You can search for a variety of definitions, but I’ll use the following:

  • Primary Key: a non-nullable attribute (or combination of attributes) that can be used to uniquely identify a specific instance of an entity.  When used within SQL, a primary key can be mapped to a column (or columns) in a table, and the value of the key uniquely identifies a row.
  • Natural Primary Key: a primary key that is not auto-generated by the database or application.  The key is comprised of attributes that are associated with an entity, and the value of those attributes is defined by some authority beyond the scope of the database or application.  For example, a Social Security number is a “arbitrarily” assigned number that belongs to a specific citizen of the United States; most databases that use the Social Security number do not create the number, but rather use it as a reference to a particular US citizen.
  • Surrogate Primary Key: a primary key that is auto-generated by the database or application to specifically identify the row in the table representing the collection of entities.  Surrogate keys have no meaning outside of the database and have no relationship to the other attributes in the table.  An ID of 1 simply identifies a row in a table; a row representing a person, a squid, or an automobile may all have an id of 1, depending on what table the surrogate key the data lives in.

Sidebar: as I was writing this, Pinal Dave post the following to his blog: http://blog.sqlauthority.com/2009/10/22/sql-server-difference-between-candidate-keys-and-primary-key-2/ 

Most novices recognize that every table needs a primary key, and surrogate keys offer some benefits that natural keys do not, including:

  • Immutability: the ability of a key to stay constant over time.  A natural primary key (such as a person’s name) may change, but a surrogate key does not.
  • Simplicity of relational JOINS: surrogate keys can remain as a singular column for each table they represent.  For example, a complete invoice may need to be represented by a ClientID, an InvoiceID, and the LineID’s for the lines on that invoice.  Joining on the natural keys may require the Client Name and Address, the Invoice Number, and the Line Number. 

However, surrogate keys have one major weakness; they do NOT enforce the unique validity of each row.  If you use an IDENTITY function in SQL Server to auto-generate your surrogate PRIMARY KEY, and you insert Stuart Ainsworth into your table of Employees, and you accidentally run your INSERT script again, you’ve just double-inserted Stuart Ainsworth.  While there are certainly multiple people with my name, I’m the only one at my company.  However, my application never noticed it.

Using a UNIQUE CONSTRAINT on the columns holding the natural key information avoids this problem; you get the benefits of a surrogate key AND the unique validation of a natural primary key.   The hard part is, of course, identifying the appropriate natural primary key to enforce.  However, this exercise should NOT be overlooked when designing a database.

Share