Proposed Solution to the Performance Problem with SQL Server Scalar UDFs

This article is a copy if a enhancement request that I have submitted on the feedback section of the Microsoft Connect web site.  Here’s a link to the item:

You may have to log in with a Microsoft Live ID and then register in order to vote.

The PDF and example script can be downloaded here

SQL Server scalar User-Defined Functions (UDFs) have a performance problem that could be solved with an enhancement to the SQL Server database engine. Because SQL Server must execute each function on every row, using any function incurs a cursor like performance penalty.  However, since many UDFs are simple, they can often be converted to a single expression.  This suggestion proposes that a new type of UDF, the Scalar Expression UDF, be added to SQL Server.  Doing so would often eliminate the performance penalty paid when using scalar UDFs and allow them to be used more widely.  Using UDFs has the advantage of abstraction that makes code easier to maintain and so making them easier to use has substantial benefits.

No matter how simple a UDF there’s a large performance penalty paid when they’re used.  This penalty shows up as poor query execution time when a query applies a UDF to a large number of rows, typically 1000 or more.  The penalty is incurred because the SQL Server database engine must create its own internal cursor like processing.  It must invoke each UDF on each row.  If the UDF is used in the WHERE clause, this may happen as part of the filtering the rows.  If the UDF is used in the select list, this happens when creating the results of the query to pass to the next stage of query processing. It’s the row by row processing that seems to slow SQL Server the most. I’ve documented the performance penalty in chapter 9 of my book, Transact-SQL User-Defined Functions.  I’ve posted a PDF with the relevant section of the book and a complete example script on this page.  Click here to download.

Many functions already are or could be converted to a single expression.  I’ve done this for my clients to solve performance problems associated with functions. My typical solution is to evaluate and test an equivalent expression and then substitute the expression anywhere the function is causing a performance problem.  This is a tedious and error prone process that requires a great deal of testing.  While it can achieve order of magnitude performance improvements, it negates the advantage of abstraction that scalar functions provide.

Here is my proposed syntax for the scalar expression user-defined function using a Books-On-Line style:

CREATE FUNCTION [ schema_name. ] function_name ( [ { @parameter_name [ AS ][ type_schema_name.parameter_data_type[ = default ] }

[ ,n ]



RETURNS return_data_type

[ WITH <function_option> [ ,n ] ]

[ AS ]

RETURN scalar_expression

[ ; ]

Because we’re talking about an expression, including the return_data_type isn’t absolutely necessary; SQL Server could just return the result type of the expression.  However, requiring the return type improves function documentation, removes the necessity of adding a CAST as part of the expression, and keeps the expression UDF similar to the existing scalar UDF.

This suggestion is made with an implicit assumption that the SQL Server parser and database engine can do the same job of substitution that a human does.  It’s the kind of problem that might be solved with the right macro processor.

I have found that the implementation of scalar UDFs is causing poor query performance and therefore inhibiting the use of scalar functions.  This suggestion would eliminate the performance penalty from many functions while retaining benefit that functions provide in code clarity and reuse.