Comparing varchar to nvarchar in SQL
A few days ago I listened to a new episode of the .NET Rocks podcast, with Kimberly Tripp on the show. While she was going through the list of things every developer needs to know, one in particular caught my interest. She mentioned we should use the different types consistently in the database. Not using types consistently within your database can cause quite some performance loss. This is because SQL Server ‘upcasts’ the minor type to the higher type (example: varchar to nvarchar). As the term ‘quite some performance loss’ is a bit broad, I wanted to know how much this performance loss is exactly and if it’s worth rewriting/redesigning your database.
To test this case I’ve created a small database, named Performance, with a lot of records inside it. The database has 3 tables, named AsciiTable, MainTable and UnicodeTable. All three of them contain 2 columns, ID and a text column. The AsciiTable-table it’s text column has the type varchar(30), the MainTable and UnicodeTable it’s text column has the type nvarchar(30). I’ve created these to test how long some specific queries will take. At the end of this post I’ve inserted the script to create this database if you want to try it for yourself.
Most of the time doing queries on tables which are quite small, performance is fine. Therefore I decided I needed to fill up these three tables with a lot of data. Thankfully we live in 2012 and we developers can use all these fine features like LINQ2SQL and the Entity Framework nowadays. Therefore the code I needed to write to fill up the tables was fairly compact:
public void InsertALotOfItemsInTheDatabase()
{
var db = new PerformanceEntities();
for (uint i = 0; i < uint.MaxValue; i++)
{
var textToInsert = string.Format("This is the {0} text!", i);
db.AddToMainTables(new MainTable { Sometext = textToInsert });
db.AddToAsciiTables(new AsciiTable { SomeAsciiText = textToInsert });
db.AddToUnicodeTables(new UnicodeTable { SomeUnicodeText = textToInsert });
if (i % 10 == 0)
{
db.SaveChanges();
}
Console.WriteLine(textToInsert);
}
db.SaveChanges();
}
You don’t really need the Console.WriteLine in the middle of the for-loop, but I like to have some response on the command line to see if everything is still working. After running this code for a few hours I was prompted with an OutOfMemoryException, so this code probably needs some fine-tuning and connection closing. Because there were 1.999.838 items in each table now I was satisfied for now. I’d like to have more, but having about 2 million items in the tables should do the trick also. Also, inserting a lot of items sure takes a lot of time and I wanted to start testing already.
Having the tables filled up I’ve written some tests to see if performance would be affected when comparing Unicode and ASCII strings with each other.
The first tests I’ve created was using a LIKE in the WHERE-clause and using ASCII and Unicode strings on the different columns. The methods look like this:
public void TestThePerformanceOfQueries()
{
const int MAX_NUMBER_OF_LOOPS = 30;
const string connectionString = "Server=localhost;Database=Performance;Trusted_Connection=True;";
using (var connection = new SqlConnection(connectionString))
{
connection.Open();
var stopWatch = new Stopwatch();
stopWatch.Start();
for (int i = 0; i < MAX_NUMBER_OF_LOOPS; i++)
{
QueryUnicodeTableWithUnicodeWhereClause(connection, i);
}
stopWatch.Stop();
Console.WriteLine(string.Format("The QueryUnicodeTableWithUnicodeWhereClause queries have taken {0} milliseconds!", stopWatch.ElapsedMilliseconds));
stopWatch.Reset();
stopWatch.Start();
for (int j = 0; j < MAX_NUMBER_OF_LOOPS; j++)
{
QueryUnicodeTableWithAsciiWhereClause(connection, j);
}
stopWatch.Stop();
Console.WriteLine(string.Format("The QueryUnicodeTableWithAsciiWhereClause queries have taken {0} milliseconds!", stopWatch.ElapsedMilliseconds));
stopWatch.Reset();
stopWatch.Start();
for (int k = 0; k < MAX_NUMBER_OF_LOOPS; k++)
{
QueryAsciiTableWithAsciiWhereClause(connection, k);
}
Console.WriteLine(string.Format("The QueryAsciiTableWithAsciiWhereClause queries have taken {0} milliseconds!", stopWatch.ElapsedMilliseconds));
stopWatch.Reset();
stopWatch.Start();
for (int l = 0; l < MAX_NUMBER_OF_LOOPS; l++)
{
QueryAsciiTableWithUnicodeWhereClause(connection, l);
}
stopWatch.Stop();
Console.WriteLine(string.Format("The QueryAsciiTableWithUnicodeWhereClause queries have taken {0} milliseconds!", stopWatch.ElapsedMilliseconds));
stopWatch.Reset();
}
}
private void QueryUnicodeTableWithUnicodeWhereClause(SqlConnection connection, int id)
{
string query = string.Format(@"SELECT [ID] ,[SomeUnicodeText]
FROM [Performance].[dbo].[UnicodeTable]
WHERE [SomeUnicodeText] LIKE N'%{0}%'", id);
var command = new SqlCommand(query, connection);
using (var dataReader = command.ExecuteReader())
{
if (dataReader != null)
{
var result = dataReader.Read();
}
}
}
private void QueryUnicodeTableWithAsciiWhereClause(SqlConnection connection, int id)
{
string query = string.Format(@"SELECT [ID] ,[SomeUnicodeText]
FROM [Performance].[dbo].[UnicodeTable]
WHERE [SomeUnicodeText] LIKE '%{0}%'", id);
var command = new SqlCommand(query, connection);
using (var dataReader = command.ExecuteReader())
{
if (dataReader != null)
{
var result = dataReader.Read();
}
}
}
private void QueryAsciiTableWithAsciiWhereClause(SqlConnection connection, int id)
{
string query = string.Format(@"SELECT [ID] ,[SomeAsciiText]
FROM [Performance].[dbo].[AsciiTable]
WHERE [SomeAsciiText] LIKE '%{0}%'", id);
var command = new SqlCommand(query, connection);
using (var dataReader = command.ExecuteReader())
{
if (dataReader != null)
{
var result = dataReader.Read();
}
}
}
private void QueryAsciiTableWithUnicodeWhereClause(SqlConnection connection, int id)
{
string query = string.Format(@"SELECT [ID] ,[SomeAsciiText]
FROM [Performance].[dbo].[AsciiTable]
WHERE [SomeAsciiText] LIKE N'%{0}%'", id);
var command = new SqlCommand(query, connection);
using (var dataReader = command.ExecuteReader())
{
if (dataReader != null)
{
var result = dataReader.Read();
}
}
}
The results of these queries aren’t that exciting. All tests take about the same time to execute:
The QueryUnicodeTableWithUnicodeWhereClause queries have taken 35210 milliseconds!
The QueryUnicodeTableWithAsciiWhereClause queries have taken 34335 milliseconds!
The QueryAsciiTableWithAsciiWhereClause queries have taken 32946 milliseconds!
The QueryAsciiTableWithUnicodeWhereClause queries have taken 33810 milliseconds!
It appears it doesn’t matter (much) what type of string you are using in the WHERE-clause of a SQL-query.
I wasn’t satisfied with these results as it’s unlikely Kimberly was wrong on the podcast, so I’ve created a few other test methods. In these new methods the columns are compared with each other in SQL and not with a string supplied in C#. These new methods look like below:
private void QueryUnicodeTableInnerJoinAsciiTableWithWhereOnId(SqlConnection connection)
{
string query = string.Format(@"SELECT TOP 10000 *
FROM [UnicodeTable] INNER JOIN [AsciiTable]
ON [UnicodeTable].[ID] = [AsciiTable].[ID]
WHERE [UnicodeTable].[SomeUnicodeText] = [AsciiTable].[SomeAsciiText]");
var command = new SqlCommand(query, connection);
using (var dataReader = command.ExecuteReader())
{
if (dataReader != null)
{
var result = dataReader.Read();
}
}
}
private void QueryUnicodeTableInnerJoinMainTableWithWhereOnId(SqlConnection connection)
{
string query = string.Format(@"SELECT TOP 10000 *
FROM [UnicodeTable] INNER JOIN [MainTable]
ON [UnicodeTable].[ID] = [MainTable].[ID]
WHERE [UnicodeTable].[SomeUnicodeText] = [MainTable].[SomeText]");
var command = new SqlCommand(query, connection);
using (var dataReader = command.ExecuteReader())
{
if (dataReader != null)
{
var result = dataReader.Read();
}
}
}
private void QueryMainTableInnerJoinAsciiTableWithWhereOnId(SqlConnection connection)
{
string query = string.Format(@"SELECT TOP 10000 *
FROM [MainTable] INNER JOIN [AsciiTable]
ON [MainTable].[ID] = [AsciiTable].[ID]
WHERE [MainTable].[SomeText] = [AsciiTable].[SomeAsciiText]");
var command = new SqlCommand(query, connection);
using (var dataReader = command.ExecuteReader())
{
if (dataReader != null)
{
var result = dataReader.Read();
}
}
}
And they are called the same way as before:
stopWatch.Start();
for (int m = 0; m < MAX_NUMBER_OF_LOOPS; m++)
{
QueryUnicodeTableInnerJoinAsciiTableWithWhereOnId(connection);
}
stopWatch.Stop();
Console.WriteLine(string.Format("The QueryUnicodeTableInnerJoinAsciiTableWithWhereOnId queries have taken {0} milliseconds!", stopWatch.ElapsedMilliseconds));
stopWatch.Reset();
stopWatch.Start();
for (int n = 0; n < MAX_NUMBER_OF_LOOPS; n++)
{
QueryUnicodeTableInnerJoinMainTableWithWhereOnId(connection);
}
stopWatch.Stop();
Console.WriteLine(string.Format("The QueryUnicodeTableInnerJoinMainTableWithWhereOnId queries have taken {0} milliseconds!", stopWatch.ElapsedMilliseconds));
stopWatch.Reset();
stopWatch.Start();
for (int n = 0; n < MAX_NUMBER_OF_LOOPS; n++)
{
QueryMainTableInnerJoinAsciiTableWithWhereOnId(connection);
}
stopWatch.Stop();
Console.WriteLine(string.Format("The QueryMainTableInnerJoinAsciiTableWithWhereOnId queries have taken {0} milliseconds!", stopWatch.ElapsedMilliseconds));
Results of these tests are stunning!
The QueryUnicodeTableInnerJoinAsciiTableWithWhereOnId queries have taken 23997 milliseconds!
The QueryUnicodeTableInnerJoinMainTableWithWhereOnId queries have taken 312 milliseconds!
The QueryMainTableInnerJoinAsciiTableWithWhereOnId queries have taken 24119 milliseconds!
Comparing columns in SQL which are of a different type really causes a big performance loss. As you can see, comparing a nvarchar with an other nvarchar column only takes 312 milliseconds in the scenario we tested. When comparing a varchar with a nvarchar, it takes about 23 seconds. So comparing columns of the same type is 100 times faster (in above scenario)!
I was really shocked when seeing these results and it shows we should design databases with care and consistency, as Kimberly already suggested. So, to everyone who is reading this post, check your databases for inconsistency and try to fix it.
Database script to create the database:
USE [Performance]
GO
/****** Object: Table [dbo].[UnicodeTable] Script Date: 11/05/2012 20:34:39 ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE TABLE [dbo].[UnicodeTable](
[ID] [bigint] IDENTITY(1,1) NOT NULL,
[SomeUnicodeText] [nvarchar](30) NULL,
CONSTRAINT [PK_UnicodeTable] PRIMARY KEY CLUSTERED
(
[ID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
/****** Object: Table [dbo].[MainTable] Script Date: 11/05/2012 20:34:39 ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE TABLE [dbo].[MainTable](
[ID] [bigint] IDENTITY(1,1) NOT NULL,
[Sometext] [nvarchar](30) NULL,
CONSTRAINT [PK_MainTable] PRIMARY KEY CLUSTERED
(
[ID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
/****** Object: Table [dbo].[AsciiTable] Script Date: 11/05/2012 20:34:39 ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
SET ANSI_PADDING ON
GO
CREATE TABLE [dbo].[AsciiTable](
[ID] [bigint] IDENTITY(1,1) NOT NULL,
[SomeAsciiText] [varchar](30) NULL,
CONSTRAINT [PK_AsciiTable] PRIMARY KEY CLUSTERED
(
[ID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
SET ANSI_PADDING OFF
GO