Friday, April 26, 2013

SQL Server Collation

Question: I was putting together a small bit of SQL code the other day, and I was running into a bug. While debugging it, I was completely shocked to discover:

IF 'a' < 'A'
SELECT 'a < A'

IF 'a' = 'A'
SELECT 'a = A'

IF 'a' > 'A'
SELECT 'a > A'

IF 'a' <> 'A'
SELECT 'a <> A'

displays only 'a = A'.

I investigated and found out that the reason for this result is that the database's collation sequence is set to one of the "CI" values (Case Insensitive). I took a look at a couple of other databases, and they were also "CI". Oddly enough, they were slightly different collation sequence values, but definitely "CI". This includes the [master] database.

Yes, I know it could have been configured the other way. That's not the point of this note. Is "CI" the expected standard configuration?

In addition to the crazy logic implications, I've got to believe that it's at least a bit slower to execute, since all text-based columns needs to be compared in a case-insensitive way. That character set translation can't come for free.


Answer:
The collation setting is specified during the installation of SQL Server. As seen in Figure 1, the default settings used during the installation of SQL Server 2012 are:

Database Engine: SQL_Latin1_General_CP1_CI_AS
Analysis Services: Latin1_General_CI_AS


Figure 1: SQL Server 2012 Default Collation Settings

Technically, there is no "expected CI standard". It all depends on the collation setting of the database. This is not to say you're stuck with that setting forever. If you visit http://msdn.microsoft.com/en-us/library/ms143726.aspx, you will see the following excerpt:

Setting collations are supported at the following levels of an instance of SQL Server:
Server-level collations
The default server collation is set during SQL Server setup, and also becomes the default collation of the system databases and all user databases. Note that Unicode-only collations cannot be selected during SQL Server setup because they are not supported as server-level collations. After a collation has been assigned to the server, you cannot change the collation except by exporting all database objects and data, rebuilding the master database, and importing all database objects and data. Instead of changing the default collation of an instance of SQL Server, you can specify the desired collation at the time that you create a new database or database column.

Database-level collations
When a database is created or modified, you can use the COLLATE clause of the CREATE DATABASE or ALTER DATABASE statement to specify the default database collation. If no collation is specified, the database is assigned the server collation.
You cannot change the collation of system databases except by changing the collation for the server. The database collation is used for all metadata in the database, and is the default for all string columns, temporary objects, variable names, and any other strings used in the database. When you change the collation of a user database, know that there is the potential for collation conflicts when queries in the database access temporary tables. Temporary tables are always stored in the tempdb system database, which will use the collation for the instance. Queries that compare character data between the user database and tempdb may fail if the collations cause a conflict in evaluating the character data. You can resolve this by specifying the COLLATE clause in the query. For more information, see COLLATE (Transact-SQL).

Column-level collations
When you create or alter a table, you can specify collations for each character-string column by using the COLLATE clause. If no collation is specified, the column is assigned the default collation of the database.

Expression-level collations
Expression-level collations are set when a statement is run, and they affect the way a result set is returned. This enables ORDER BY sort results to be locale-specific. Use a COLLATE clause such as the following to implement expression-level collations:

SELECT name FROM customer ORDER BY name COLLATE Latin1_General_CS_AI;

For more information on changing collation, please visit http://msdn.microsoft.com/en-us/library/ms179254.aspx.