Category: CLR

T-SQL Vs. CLR

By , May 16, 2010 17:07

A colleague of mine asked me if I knew how to convert a character string to a decimal(18,3). Our System was receiving flat files, from a 3rd party system, containing master data which is also needed by our application. No point in double entries, so we import the data; one of the methods of doing so is to load the file content into temporary tables in the database and then take over using T-SQL procedures.

Instincts Reign

“No big deal” I thought and my immediate response was to go with a cast or convert, because I do like to solve problems in T-SQL whenever possible. Then my colleague told me that the character values contain a digit grouping symbol (,) and a decimal symbol (.) and there were 3 digits after the decimal symbol eg. 1,684.582 (in words: one thousand six hundred and eighty four point five hundred and eighty two).

I knew that my idea wasn’t going to work, but I tried it anyway:

declare @val varchar(35) --this is the input format
set @val = '1,684.582';
select CAST(@val as decimal(18,3))

And got this error message (as had my colleague before me):

Msg 291, Level 16, State 1, Line 3
CAST or CONVERT: invalid attributes specified for type ‘varchar’
 

I told him that I could write a CLR user function to do the conversion in .NET and would get back to him.

public partial class UserDefinedFunctions
{
  [Microsoft.SqlServer.Server.SqlFunction]
  public static SqlDecimal ufn_StringToDecimal(SqlString inputString)
  {
    try
    {
      return Convert.ToDecimal(inputString.Value);
    }
    #region error handling
    catch (FormatException fex)
    {
      throw new FormatException(fex.Message);
    }
    catch (InvalidCastException icex)
    {
      throw new InvalidCastException(icex.Message);
    }
    catch (OverflowException oex)
    {
      throw new OverflowException(oex.Message);
    }
    catch (Exception ex)
    {
      throw new Exception(ex.Message);
    }
    #endregion error handling
  }
};

I am by no means a C# crack, so if there is anybody out there that has any ideas as to how to better go about this, feel free to let me know. In particular, I would like to know if there is a better way to pass exception information back to the SQL-Client.

Once I had completed the coding and had deployed the assembly and user function to the Dev-Database, I went to my colleague’s office to let him know he could do some testing with his import procedure. This was when I found out that I had gone to the effort for nothing (or so I thought).

Another one of my colleagues had had the same problem in a project he had dealt with. This is the code he provided the first colleague with:

declare @val varchar(35)
set @val = '1.123,456';
select CAST(REPLACE(REPLACE(@val, '.', ''), ',', '.') as decimal(18,3))

Not very pretty, but it works (well that depends – see Regional Problems) and to be honest it was one of the things I did think of before deciding to resort to CLR.

Regional Problems

One of the reasons I decided to go for CLR was because I thought I was going to have to access the Operating System’s regional settings to get the various symbols in order to replace them in the string being passed into the function. And that is not possible using T-SQL, but is easily done in .NET. These values vary depending on the regional settings of the server running the MSSQL service. This was something my two colleagues didn’t think of 🙂 Besides, I was looking for a reason to do a bit of coding in my freshly installed Visual Studio 2010 Ultimate, which I don’t get to do too often. As it turns out, .NET does all of the work for you in finding the correct symbols based on the system’s regional settings. So all I had to do was a simple Convert.ToDecimal() and .NET did the rest for me.

Was my effort all for nothing?
I hate doing something for nothing and I was sure that my solution was going to perform better than two replaces and a cast! In particular, when large amounts of data were going to be imported in one session.

So I got on to my ThinkTank database and did a bit of testing.

I ran the code that the second colleague provided and the result was 1123.456, which is what I expected. Then I had a closer look at the data that my first colleague provided me with and ran the select with one of his values. Notice the different grouping symbols.

declare @val varchar(35)
set @val = '1,684.582';
select CAST(REPLACE(REPLACE(@val, '.', ''), ',', '.') as decimal(18,3))

The result of the select was: 1.685. So we had lost the digits after the decimal symbol and the value was 1000 times smaller that the actual input (talk about data compression)! The reason for this was that here in Germany (where we work and have our development systems) the digit grouping symbol and the decimal symbol are exactly the other way around as is the case in the English speaking world.
So to a German system 1.684,582 means what 1,684.582 would mean to an “English” system. (The database that my colleague was importing data into, is situated in the USA). The second colleague had just assumed that he was going to be dealing with input that had a certain format and wasn’t allowing for his routine to run with different regional settings.

This was the first knock-out for the replace routine. I would hate to have to go through all the import routines when deploying a database to a foreign country to have to replace ‘,’ with ‘.’ and vice versa.

Next I compared the T-SQL routine (using REPLACE()) with my user function using 10000, 100000 and one million rows respectively in a temp table. I started off doing simple selects but found the overall run times between the two to be very similar, although my solution was the clear winner as far as CPU time was concerned. Most of the time was being used up by the client to dislplay the results (see performance graphs below). Then I decided to be a little bit more realistic and rather than just select, I inserted the results into physical tables. After all, I don’t know too many people who want to visualise 1 million rows at any time.

This is a summary of what I observed:
(I ran each query 5 times and these are the averages)

Comparison Results

The performance of the CLR was much better that the replace in T-SQL. Not only in elapsed time, but also in CPU usage.

The performance graphs for the CLR solution

These are the performance graphs that I was monitoring while running my tests on the CLR solution for one million rows. I used Process Explorer from Sysinternals to monitor CPU and Memory usage as well as IO. Sysinternal’s website (if you don’t already know) can be found under this link. I am not including these graphs to show the difference between the two solutions as I have already done so above, but rather to show how performance indicators should be monitored closely depending on what and how you are testing.

 System Information
System Performance
SQL Server Performance
SQL Server Performance

Management Studio performance

The first batch (the first set of performance peaks) was the insert and we can see the IO activity to testify that data was being written to disk. We can also see that the CPU was used by the SQL-Service and not the client application (which in this test, were on one and the same machine).

The second batch (the second set of performance peaks) was just the select in which the results were displayed in Management Studio. Here we can clearly see a shift in CPU usage from the SQL-Service to Management Studio.

Displaying a million rows in a client application is a fairly unlikely scenario, while building a set of data that contains 1 million rows to work with isn’t. Keep this in mind when monitoring CPU usage while testing your code. Depending on how your solution is built, you may need to consider that the client-app will be hogging the lion’s share of the CPU time. I find this to be a valuable piece of information.

The performance graphs for the replace solution were very similar so it doesn’t serve any purpose to include the screenshots here. The major differences were that CPU usage was higher than for CLR and the runtime, as indicated in the table above, was almost double that of the CLR version.

What does all this mean?

I can only conclude that the CLR solution wins across the board. It is faster, uses less CPU and is environment-setting safe.

What do you think?

Panorama Theme by Themocracy