The previous two articles have been devoted to implementations of the Rot13 cipher. This is a character substitution cipher that is used to obscure text strings in the Window registry and in many web cookies. Both implementations were contributed by Luke Schollmeyer. You’ll find the first article here and the second one here.
The first version, ufn_Rot13, uses a loop to translate each character. The second version, ufn_Rot13Redux, puts the loop into a SELECT statement so that it’s executed in the SQL data engine as part of the SELECT processing rather than in an explicit loop. Take a look back at those articles for the details and code.
This issue is devoted to comparing the performance of the two versions of the function. As I write this I have no idea how it’s going to come out but I thought it would be interesting to see.
This test of the performance of the two functions uses a methodology similar to the one I’ve used before. The methodology is also used in Chapter 10 of Transact-SQL User-Defined functions, my book that you can download for free. We’re going to construct a table with 8192 rows of random character data. Then we’re going to pin the table into memory to eliminate I/O related differences in the trials. Finally, we’re going to execute the two functions on the 8192 rows and use the MAX aggregation function to minimize the amount of data sent to the client program. I’m pretty confident that this test gives a fair comparison of the functions. If you think there’s a flaw in my reasoning, please let me know at [email protected]
In order to create the data, we’re going to need another UDF, udf_Txt_XlateChars, which modifies a string by translating one character to another. Here’s the CREATE FUNCTION script:
SET ANSI_NULLS ON
GOCREATE FUNCTION dbo.udf_Txt_XlateChars (
@sInput varchar(8000) — string to remove chars from
, @sFromChars varchar(255) — Characters to remove
, @sToChars varchar(255) — Characters to replace with
) RETURNS varchar(8000) — @sInput with characters in
–@sFromChars replaced with the character in @sToChars
* Modifies @sInput by replacing any character in @sFromChars
* with the corresponding character in @sToChars.
* Works from left to right in @sFromChars so chars may be
* translated twice if a character appears in @sToChars
* to the left of where it appears in @sFromChars.
, ‘-,\/.;:<>()#[email protected]!%^&*+=””’, ‘………………….’)
PRINT ‘Test 1 ‘ + CASE WHEN ‘.ABC.D.EFGHIJ..XKL.M.NOP.’ =
, ‘-\/.;:<>()#[email protected]!%^&*+=””’, ‘………………….’)
THEN ‘Worked’ ELSE ‘ERROR’ END
* © Copyright 2003 Andrew Novick https://novicksoftware.com
* You may use this function in any of your SQL Server databases
* including databases that you sell, so long as they contain
* other unrelated database objects. You may not publish this
* UDF either in print or electronically.
* Published in the T-SQL UDF of the Week Vol 2 #44 11/9/04
DECLARE @sReplaced varchar(8000) — Working result
, @nNum2Rep int — how many characters to replace
, @nPos int — postion in @sFromChars
, @RemoveChar CHAR(1) — Char to replace this time
SELECT @nNum2Rep = LEN(@sFromChars)
, @nPos = 1
, @sReplaced = @sInput
WHILE @nPos <= @nNum2Rep BEGIN
SET @sReplaced = REPLACE(@sReplaced
, SUBSTRING(@sFromChars, @nPos,1)
, SUBSTRING(@sToChars, @nPos, 1)
SET @nPos = @nPos + 1
END — WHILE
SET QUOTED_IDENTIFIER OFF
SET ANSI_NULLS ON
GRANT EXEC ON dbo.udf_Txt_XlateChars to PUBLIC
Like ufn_Rot13, udf_Txt_XlateChars has a WHILE loop that is executed once for each character in a string. Instead of being executed once for each character in the input, it’s executed once for each character in the translation strings, and the REPLACE function is used the change all the characters of one type to characters of another.
This leads to a limitation of udf_Txt_XlateChars that prevents its use for replacing ufn_Rot13. Because the result of replacing each character is used as the input of the next replacement if one of the characters in the @sFromChars parameter is also in the @sToChars that it could get translated twice.
I use this function mostly for translating special characters into a more acceptable alternative and so I’ve been able to use this function with that limitation.
Now let’s build our test data so we can compare the performance of these UDFs. The next stored procedure was used in Volume 1 #10 to build a table of number strings. Here’s the CREATE PROC script:
— 20 Creates 1,048,576 rows
DECLARE @LC int — Loop counter
— delete an existing ExampleNumberString table
if exists (select * from dbo.sysobjects
where id = object_id(N’dbo.ExampleNumberString’)
and OBJECTPROPERTY(id, N’IsUserTable’) = 1)
DROP TABLE dbo.ExampleNumberString
CREATE TABLE ExampleNumberString (
ID int identity (1,1)
, BigNum Numeric (38, 0)
, NumberString varchar(128) NULL
INSERT INTO ExampleNumberString (BigNum, NumberString)
VALUES( CONVERT (numeric(38,0), rand() * 9999999999999)
, ‘ ‘) — preallocate
SELECT @LC = 0
WHILE @LC < @Loops BEGIN
INSERT INTO ExampleNumberString (BigNum, NumberString)
SELECT BigNum * RAND(@LC + 1) * 2
, ‘ ‘
SELECT @LC = @LC + 1
END — WHILE
SET NumberString = convert(varchar(128)
, convert(numeric(38,0), 9834311) * bignum)
Once you’ve created the procedure, run it to create as many rows as you like. The parameter to usp_CreateExampleNumberString is use as the exponent of 2 to calculate the number of rows desired. So 3 creates 8 rows and 20 creates 1,048,576 rows. I suggest that you use at least 8192 rows (@Loops = 13) to get meaningful results as I do in the script that follows:
select count(*) from ExampleNumberString — verify the count
select top 10 * from ExampleNumberString — check out a few rows
Now since we’re translating letters instead of numbers, we need a different table that has letters. That’s where udf_Txt_XlateChars comes in. Here we translate the numbers to letters and make a new table out of it:
SELECT dbo.udf_Txt_XlateChars (Numberstring, ‘0123456789’
, ‘ABCDEVWXYZ’) Letters
Since Rot13 divides the alphabet at the 13’th letter, the statement above translates half the numbers into letters below the split and half to letters above the split.
Since we’re going to run two queries against the Letters column to test the time it takes to use a UDF on the 8192 rows, we should try and make the circumstances for the two queries as close as possible. One way to do that is to force all the rows that we’ll query into memory and keep them there.
If we don’t have all the desired rows in memory then we might be comparing one query that reads a 8192 rows from disk with another query that reads the same 8192 rows from SQL Server’s page cache.
SQL Server provides the DBCC PINTABLE to force all pages from a table to remain in the cache once they’ve been read the first time. Use it with caution. It can fill SQL Server’s cache and cause the query engine to lock up to the point where you have to shut it down and restart it. So only do this if you have adequate RAM. There’s no point using virtual memory as a substitute RAM. That just substitutes one form of disk I/O (paging) for the one we’re trying to eliminate (reading pages from disk).
This next script pins ExampleLetterString in memory. On my test system with a 8192 row table it forced SQL Server to consume 32 megabytes of RAM. The test system has 1000 megabytes of RAM and I can use the task manager to see that memory isn’t full. If your system is short on RAM, reduce the number of rows in the test to eliminate paging.
— Be sure that you have enough memory available before you do
— this. My SQL Server Process grew to 32 Megabytes when I ran
— this script.
DECLARE @db_id int, @tbl_id int
SET @db_id = DB_ID(”)
SET @tbl_id = OBJECT_ID(‘ExampleLetterString’)
DBCC PINTABLE (@db_id, @tbl_id)
— Now read all the rows to force the pages into the cache
SELECT * from ExampleLetterString
The stage is set for comparing the two queries using the two Rot13 functions. Before we run the queries, I want to throw in one more wrinkle. Any query that returns a million rows to SQL Query Analyzer’s results window is going to do a lot of work on sending, receiving, and displaying the results. To eliminate most of that work, I’ve used the MAX function to just return the biggest result. You’ll see this function in the queries below.
This took 26 seconds on my system.
This took 11 seconds on my system.
So the explicit WHILE loop takes about 2 and a half times the version using the implicit loop in the select statement. That’s a significant but not overwhelming difference. It might make this technique worth exploring in your queries.
However, if you step back for a second and see that it takes on average 1.3 milliseconds to execute ufn_Rot13Redux, you must approach using a UDF this way with caution. 1.3 milliseconds is fine if you’re executing the UDF once to prepare a cookie for a web page. If you’re processing a million rows with a UDF, watch out. That 1.3 milliseconds would add up to 1300 seconds or about 22 minutes.