One of the first decisions you need to make when installing SQL Server is to select a collation for the instance. Most of the time we simply choose the default and breeze on by. But do you really know all the effects that choice has on your database and the applications that run against it?
What is the default?
By default, SQL Server chooses the collation that corresponds to the Windows system locale on your database server. You can see your current locale in the Regional and Language Options section of Control Panel. The default for U.S. servers is SQL_Latin1_General.
What does it affect?
The collation you choose will have 3 effects on your system.
- the code page – This is the mapping between the hex values that get stored in the database and the character they represent. It’s what makes 0x43 the letter C for the Latin alphabet.
- case sensitivity – Your collation determines whether C = c. This obviously has an impact on your applications and the data they’ll bring back during searches and joins. But an important thing to keep in mind is that it also impacts whether SELECT * FROM MyTab is the same as SELECT * FROM mytab. If you choose a case sensitive collation setting on installation, MyTab != mytab.
- sort order – The collation you select will also affect the sort order of query results. In a case sensitive dictionary sort (the kind used by Latin1_General_CS_AS), “cat” will come before “Cat”. However, in a binary sort order (Latin1_General_BIN), “Cat” comes before “cat”. In fact, “Toad” will come before “cat”, because the order is based on the bit value of the character.
Where is it set?
The collation of the server is selected at install time. This becomes the collation for all the system databases and any user databases that are created without specifying a collation. However, an individual user database can have a different collation from the SQL Server instance. And a column inside a table can have a different collation than the database or the instance. The database and column collation can be changed relatively easily. The only “easy” way to change the server collation is to rebuild the instance, so it’s worth taking a moment to think about your collation before installation.
What about Unicode?
SQL Server collations do not affect Unicode datatypes, nchar, nvarchar, etc. SQL Server collations are the ones that start with “SQL_”. These are provided to offer backward compatibility.
Cannot resolve collation conflict … for equal to operation
This, in my mind, is the biggest headache associated with collations. If you try to join 2 columns of incompatible collations, a case sensitive with a case insensitive for example, you’ll get a variation on the above error. SQL Server doesn’t know which collation’s rules to follow. To get around this, you’ll need to include the COLLATE clause in your query, like this:
USE AdventureWorks; GO SELECT * FROM MyTabCI ci INNER JOIN MyTabCS cs ON ci.ColA COLLATE Latin1_General_CS_AS = cs.ColA
This will tell SQL Server which collation to use for the join.
That’s about it. The important thing to remember with collations is that a little thought up front can save you a lot of hassle down the road.