Varchar(x) where x <= 2 does not make sense... or does it?

While preparing my “Inheriting a database for Developers” session for SQLSaturday#162 in Cambridge, I have had a very interesting finding on the matter of the Varchar datatype usage. There is a more or less common knowledge of the SQL Server internals, that the Varchar datatype internally adds 2 additional bytes to the content of the column. Those 2 bytes are used to control the length of the text content internally, so SQL Server knows and understands where the text actually ends. From this point of view, even when the content of the column is empty, then it takes 2 bytes of space anyway.

With this knowledge, it was for me extremely easy and logical to believe that creating any columns with Varchar(2) or even Varchar(1) datatypes does not make any sense at all.
I know that I did believe in it until a couple of days ago, until I actually started playing with it. :)

This is a first blog in the series of exploring a little bit deeper this datatype usage.
I have found that the following situations are very interesting to investigate:
– Tables with Varchar(x) NULL;
– Tables with Varchar(x) NOT NULL, having different amounts of data;
– Tables with Varchar(x) NULL, using Sparse columns (SQL Server 2008+);
– Tables with Varchar(x) using different Data Compression methods (SQL Server 2008+);

All of the findings will be compared between CHAR(x) and VARCHAR(x) datatypes on SQL Server Box and Azure SQL Database editions.

Some basic information:
First of all, lets consider tha we should bare in mind that both datatypes Char and Varchar while being similar in purpose of holding a some short text content, are very different in their internal implementation nature. The Char datatype size is always well defined and is placed on the datapage at the location and at the sequence order defined by the table creation, while Varchar, like other variable length datatypes are being moved from their position to the end of the table definition, right after the last fixed size column datatype. This gives a very interesting twist to the final results of this experiment. :)

1. We shall create 2 similar tables using different datatypes (Char and Varchar) compare the respective occupied space. Both tables will have 7 columns, clustered primary key and the differences between them shall be marked in bold.
2. We shall insert 1.000.000 rows into each of the tables, so that we could have some information to play with. The final results will naturally be much bigger on bigger tables or in more extreme internal layouts, with much wider rows.

I won’t spent more time keeping you waiting, and lets dive into SQL Server 2012 and experiment the following code:

CREATE TABLE dbo.CharTestTable
(
	id INT IDENTITY NOT NULL PRIMARY KEY,
	client_name CHAR(20) NOT NULL DEFAULT 'SuperClient',
	sex CHAR(1) NULL,
	confirmation CHAR(1) NULL,
	state_code CHAR(2) NULL,
	some_date DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP,
	some_amount DECIMAL(9,2) NOT NULL DEFAULT 128.34,
);

CREATE TABLE dbo.VarcharTestTable
(
	id INT IDENTITY NOT NULL PRIMARY KEY,
	client_name CHAR(20) NOT NULL DEFAULT 'SuperClient',
	sex VARCHAR(1) NULL,
	confirmation VARCHAR(1) NULL,
	state_code VARCHAR(2) NULL,
	some_date DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP,
	some_amount DECIMAL(9,2) NOT NULL DEFAULT 128.34,
);

-- Insert 1.000.000 rows into both of the tables
SET NOCOUNT ON;
BEGIN TRAN
	DECLARE @i INT;
	SET @i = 0;

	WHILE @i < 1000000
	BEGIN 
		INSERT INTO dbo.CharTestTable
			DEFAULT VALUES;

		INSERT INTO dbo.VarcharTestTable
			DEFAULT VALUES;
			
		SET @i = @i + 1;
	END
COMMIT;

-- Lets compare the space occupied 
EXEC sp_spaceused 'dbo.CharTestTable';
EXEC sp_spaceused 'dbo.VarcharTestTable';

These are results of what those queries, with Varchar clearly occupying much less space. The difference is around 4MB which represents in this case around 10% of the total space, which is pretty amazing to say at least. What has just happened here? Why on planet Earth do Varchar is occupying much less space, when it was clearly stated, that it has 2 extra bytes, so it should be at least equal to the table with Char datatypes.

I was lucky enough to remember a great article on the matters of NULLable records when using variable datatypes, written by Kimberly Tripp. I don’t want to spoil all the fun of reading that article, but I shall just refer that the Varchar Nullable columns shall not occupy a lot of space when having a NULL value inserted. This actually means that the records of the second table (VarcharTestTable) are actually smaller. We can confirm this by consulting the internal information of the both tables:

-- Consult the internal data of our tables
select OBJECT_NAME(object_id) as TableName, min_record_size_in_bytes, max_record_size_in_bytes, alloc_unit_type_desc, index_level, record_count, page_count, avg_page_space_used_in_percent
	from sys.dm_db_index_physical_stats (db_id(), object_id('dbo.CharTestTable'), null, null, 'detailed');

select OBJECT_NAME(object_id) as TableName, min_record_size_in_bytes, max_record_size_in_bytes, alloc_unit_type_desc, index_level, record_count, page_count, avg_page_space_used_in_percent
	from sys.dm_db_index_physical_stats (db_id(), object_id('dbo.VarcharTestTable'), null, null, 'detailed');

Consulting the results brings some very interesting information: the minimum and the maximum row length of the CharTestTable is 48 bytes, while the rows at the VarcharTestTable occupy just 44 bytes. Also, at the last index level the number of pages (8K) differs between 6212 for CharTestTable and 5862 for VarcharTestTable.

This actually means that in the cases where almost every row of our table is NULL, the Varchar(1) or Varchar(2) datatype actually can be quite useful in the terms of space. In the second part of this post, we shall take this matters a little bit deeper and from more different angles. :)

-- Drop the tables if not needed anymore
DROP TABLE dbo.CharTestTable;
DROP TABLE dbo.VarcharTestTable;

Update for tests on Windows Azure SQL Database on 27th of August of 2012:

I have successfully tested almost the same script on Windows Azure with a couple of exceptions, which shall be revealed in the next post. I just reveal that after inserting the required registers I have rebuild both clustered indexes with this commands:

Alter Table dbo.CharTestTable
rebuild;

GO

Alter Table dbo.VarcharTestTable
rebuild;

Since at the moment of the writing, there is no support for the so much beloved sp_spaceused procedure, I have used the following script for consulting the created table sizes

select    
      obj.name as TableName, sum(reserved_page_count) * 8.0 / 1024 as SizeInMb
from sys.dm_db_partition_stats part
   join sys.objects obj
   on part.object_id = obj.object_id
group by obj.name

My results were 48,76 MB for CharTestTable and 44,64 MB for the VarcharTestTable, which are in par with the results obtained on the SQL Server Sunshine edition.

I have used then the very same script to consult the internal values of the row length for both tables and they were perfectly the same as on the Box version of SQL Server.

select OBJECT_NAME(object_id) as TableName, min_record_size_in_bytes, max_record_size_in_bytes, alloc_unit_type_desc, index_level, record_count, page_count, avg_page_space_used_in_percent
 from sys.dm_db_index_physical_stats (db_id(), object_id('dbo.CharTestTable'), null, null, 'detailed')
union all
select OBJECT_NAME(object_id) as TableName, min_record_size_in_bytes, max_record_size_in_bytes, alloc_unit_type_desc, index_level, record_count, page_count, avg_page_space_used_in_percent
 from sys.dm_db_index_physical_stats (db_id(), object_id('dbo.VarcharTestTable'), null, null, 'detailed');

One thought on “Varchar(x) where x <= 2 does not make sense... or does it?

  1. Pingback: Char(1-2) vs Varchar(1-2) to be NULL or to be NOT NULL | Nikoport

Leave a Reply

Your email address will not be published. Required fields are marked *