Monday 15 December 2014

DISTINCT in case insensitive SQL Server

SolutionTo illustrate this behavior we are going to look at a couple ways this works using a case sensitive database and a case insensitive database.
The first set of queries uses the AdventureWorks database which is configured as case sensitive.  To determine the collation for your databases you can run this query:
SELECT name, collation_name
FROM master.sys.databases
We are querying the data from Preson.Contact in the AdventureWorks database.  All data is setup as mixed case, so we have no duplicates when we run this query.
SELECT DISTINCT TOP 10 FirstName
FROM Person.Contact
WHERE FirstName LIKE 'A%'
ORDER BY 1
If we update one of the record and change the FirstName from "Adam" to "ADAM" we should get two different values when we run the query.
UPDATE Person.Contact
SET FirstName = 'ADAM'
WHERE ContactID = 62
GO
SELECT DISTINCT TOP 10 FirstName
FROM Person.Contact
WHERE FirstName LIKE 'A%'
ORDER BY 1
As you can see we now show both "Adam" and "ADAM" as two different values.
The next thing we are going to do is to create a new table in a case insensitive database and then load all of the data from Person.Contact into this new table.
CREATE TABLE Test.dbo.contact (FirstName nvarchar(50))
GO
INSERT INTO Test.dbo.contact
SELECT FirstName FROM Person.Contact
GO
SELECT DISTINCT TOP 10 FirstName
FROM Test.dbo.contact
WHERE FirstName LIKE 'A%'
ORDER BY 1
GO
When we run the SELECT query you can see that the output combines both "Adam" and "ADAM" since case is ingored.
To get around this we can change the query as follows to force the collation to case sensitive on the FirstName column.
SELECT DISTINCT TOP 10 FirstName COLLATE sql_latin1_general_cp1_cs_as
FROM Test.dbo.contact
WHERE FirstName LIKE 'A%'
ORDER BY 1
When this is run we now have the values of "Adam" and "ADAM".
So depending on how your database is setup you may or may not see the differences. 

To show you another example here is just a quick way of selecting the case sensitive or case insensitive option.
The first query we run is using case sensitive, so all four rows should show up.
select distinct (item) COLLATE sql_latin1_general_cp1_cs_as
FROM (
select 'abcd' item
union all select 'ABCD'
union all select 'defg'
union all select 'deFg') items
All that is different in the next query is the name of the collation. When this query is run using case insensitive, we only get two rows.
select distinct (item) COLLATE sql_latin1_general_cp1_ci_ai
FROM (
select 'abcd' item
union all select 'ABCD'
union all select 'defg'
union all select 'deFg') items
Next Steps
  • You can see how the behavior of the database can impact the output, so next time you are looking for distinct values make sure you understand your database settings or use the COLLATE option
  • Here is another tip that shows you how you can use COLLATE in your WHERE clause Case Sensitive Search on a Case Insensitive SQL Server
  • Special thanks to Andy Novick at Novick Software for this tip idea

No comments:

Post a Comment