Sunday, January 17, 2010

String concatenation issues

I had talked before about a string concatenation issues in Sql Server. It's not about issues Sql Server has, it's about issues that a developer can have if he doesn't know how string concatenation and datatype precedence work.
I thought of re-analyzing the scenarios (and correct some minor mistakes) using SQL Server 2008.

Most issues occur when concatenationg strings used for generating dynamic sql.
In my stored procedures I use it a lot because of the way the app is designed: processes generate new tables so our sps get table name, column names as parameters. The final string to execute can get quite large. The examples in this post us nvarchar datatype (same applies to varchar which has the character limit double - 8000 instead of 4000). Of course all examples are discussed in the context where the final string variable (the result) is nvarchar(max). At some point truncation occurs.

Scenario 1 - when concatenating string constants, the result will be truncated to 4000 (nvarchar), if all the constants are less than or equal to 4000.

declare @sql nvarchar(max)
select @sql = replicate(N'a',3000) + replicate(N'b', 3000)
select len(@sql) --4000

The result is 4000, even though @sql is nvarchar(max). The truncation occurred because we are concatenating constants less than the limit of 4000.
If one constant/variable is > 4000 then truncation doesn't occur:

declare @a nvarchar(max),@sql nvarchar(max)
set @a = N'a'
select @sql = replicate(@a,4000) + replicate('a',1)
select LEN(@sql) --4001

Scenario 2 - when concatenating varchar to nvarchar, datatype precedence and implicit
conversion occur which can lead to string truncation. To avoid implicit conversion (which can also lead to poor performance in some cases like filtering on a column), same datatype should be used:

declare @sql varchar(max)
select @sql = replicate('a',3000) + replicate('b', 3000)
select len(@sql)
--result is 6000

declare @sql varchar(max), @sql1 nvarchar(max)
set @sql1 =N''
select @sql = replicate('a',3000) + replicate('b', 3000) + @sql1
select len(@sql)
--result is 4000

Here implicit conversion from varchar(6000) to nvarchar(4000) occurred even though @sql1 is nvarchar(max). A flavour of this example is in the next section.
See, one might wonder what would be the above result if I do one small change:

declare @sql varchar(max), @sql1 nvarchar(max)
set @sql1 =N'a'
select @sql = replicate('a',3000) + replicate('b', 3000) + @sql1
select len(@sql)

The result is 4001 (more than the limit) but truncation still occurred and it's more difficult to detect than in the other cases. See below why.

Scenario 3 - concatenating multiple string constants will lead to string truncation if the intermediary results (starting from left to right in the expression) exceed 4000, but the final result can be more than 4000 of one of the variables is of type nvarchar(max)

To illustrate:
declare @sql nvarchar(max), @str1 nvarchar(128), @str2 nvarchar(max)
SET @str1 = N'x'
set @str2 = N'y'
set @sql = REPLICATE(N'a',4000) + @str1 + @str2

select len(@sql)
select @sql

The result is 4001. Oh, it’s greater than the 4000 limit so you would think everything is ok right? Not really. The @sql result ends up with 'ay', and in fact the length should be 4002 right? So, where did the 'x' character go?
Well, sql server evaluates the expressionfrom left to right, one step at a time: first replicate(N'a', 4000) + @str1. @str is nvarchar(128) so it assumes the result should be a constant of maximum 4000, thus it truncates the ‘x’. Next it adds that to @str2 which is a nvarchar(max) so it assumes the result is a nvarchar(max), thus it adds the 'y'.

Conclusion and fix

The basic rule here is:
@sql = @str1 + @str2 + @str3 + ........+ @strn
Assuming they are all of same datatype (nvarchar), Sql Server will truncate everything starting with @str1 if any of the intermediary results of concatenation is above 4000 until it reaches a nvarchar(max) type variable. Starting from there truncation will not occur, regardless of the legth of the variables/intermediary results. This is why you can end up with results greater than 4000 limit, and still have truncation!

The solution to avoid truncation is pretty obvious: when concatenation start all the time with a nvarchar(max) variable. I prefer initializing the final string to execute like this:

DECLARE @sql nvarchar(max)
SELECT @sql = N''
SELECT @sql = @sql + @str1 + @str2 + ......@strn

It might help sometimes to work only with nvarchar(max) variables, but this will not work if you have mixes of constants and variables (and your concatenations starts with constants). As far as SQL Server 2008 goes, no difference between it and 2005 version in this matter.

Also, pay attention to strings-in-strings, meaning dynamic sql inside another dynamic sql which can lead to pretty ugly scenarios and should be avoided. More in a future post.

No comments:

Post a Comment