This UDFs week solves a problem posed to me last week by a friend Gary. Gary wants to select a random row from one of his tables and was hoping that SQL Server would have some support for this task.
The idea of selecting a random row runs counter to the ideas that are the basis of the relational database model. SQL Server doesn’t have any support for picking a random row.
That doesn’t mean that there’s no way to do what Gary wants. In fact, there are several ways to pick random rows. The technique used in this page is based on having a identity column or a similar sequential numeric key.
For an example, we’ll use the Orders table in the Northwind sample database, which has an OrderID column that fits the bill. OrderIDs are nearly sequential. If we can live with less than perfect randomness, a nearly sequential ID will suffice.
The first UDF in this issue is udf_Num_RandInt, which returns a random integer from a numeric range. It depends on the built-in RAND function. Since RAND is non-deterministic, it can’t be called directly from a UDF and we’ll have to embed it in a view in order to write the function. This technique was used in the Volume 1 #2 to write udf_DT_CurrTime and it’s explained in the T-SQL UDFS book, which you can find on this site for free.
Here’s our view that returns a random number:
GO
SET ANSI_NULLS ON
GOCREATE VIEW Function_Assist_RAND
/*
* A view to return one row, with one column, a random number from
* the built-in function RAND(). This allows a user defined
* function to bypass the restriction on access to this
* non-deterministic function. However, since no seed is
* provided, it will return the same value for all invocations
* within a query as seen in this query:select r1.[RAND], r2.[RAND] from Function_Assist_RAND as r1
cross join Function_Assist_RAND as r2
* Related Functions: udf_Num_RandInt
*
* Example:
select r1.[RAND] from Function_Assist_RAND
*
* © Copyright 2003 Andrew Novick https://novicksoftware.com
* You may use this function in any of your SQL Server databases
* including databases that you sell, so long as they contain
* other unrelated database objects. You may not publish this
* View either in print or electronically.
* Originally Published in T-SQL UDF of the Week Vol 1 #21 4/8/03
https://novicksoftware.com/UDFofWeek/UDFofWeek.com
****************************************************************/
AS
SELECT RAND() as [RAND]
GO
GRANT SELECT ON [dbo].[Function_Assist_RAND] TO [PUBLIC]
GO
Once the view is in place a UDF can now use the built-in RAND function. Here’s the CREATE FUNCTION script for udf_Num_RandInt. It returns a random integer within a range:
GO
SET ANSI_NULLS ON
GOCREATE FUNCTION dbo.udf_Num_RandInt (@MinNum int = 0 — lowest value
,@MaxNum int = 100 — highest value in range
) RETURNS INT — int between @MinNum and @MaxNum inclusive
/*
* Returns a random integer between @MinNum and @MaxNum inclusive.
* Note that while the query “SELECT RAND(), RAND()” always
* returns that same two values,
select dbo.udf_NumRandIntFromRange (1, 100)
, dbo.udf_NumRandIntFromRange (1, 100)
* returns two different values. That’s because the view is
* executed separately for each function invocation.
*
* Example:
select dbo.udf_NumRandIntFromRange (1, 100)
*
* © Copyright 2003 Andrew Novick https://novicksoftware.com
* You may use this function in any of your SQL Server databases
* including databases that you sell, so long as they contain
* other unrelated database objects. You may not publish this
* UDF either in print or electronically.
* Published in T-SQL UDF of the Week Newsletter Vol 1 #21 4/8/03
https://novicksoftware.com/UDFofWeek/UDFofWeek.htm
****************************************************************/
AS BEGIN
Declare @Result int
DECLARE @RAND float — the random number
SELECT @RAND = [RAND] From Function_Assist_Rand
RETURN ROUND(@MinNum + (@RAND * (@MaxNum – @MinNum)), 0)
END
GO
GRANT EXEC on [dbo].[udf_Num_RandInt] to [PUBLIC]
GO
Before we go to further, try out udf_Num_RandInt:
go
Results:
Random Num 15-47 | |
---|---|
19 |
Now that we have a way to get a random number from within a UDF the next script creates a UDF that returns a random row from the Northwind..Orders table and uses it to find the first OrderID that’s greater than or equal to the number selected. To make that work we have to supply the range of numbers that OrderID falls within.
Take a look at this CREATE FUNCTION script. I’ll explain what it’s doing in the text that follows.
GO
SET ANSI_NULLS ON
GOCREATE FUNCTION dbo.udf_Order_RandomRow () RETURNS TABLE
— WITH SCHEMABINDING omitted due to 3 part table reference
/*
* Returns a random row from the Northwind Orders table
*
* Example:
select * FROM dbo.udf_Orders_RandomRow ()
*
* © Copyright 2003 Andrew Novick https://novicksoftware.com
* You may use this function in any of your SQL Server databases
* including databases that you sell, so long as they contain
* other unrelated database objects. You may not publish this
* UDF either in print or electronically.
* Published in T-SQL UDF of the Week Newsletter Vol 1 #21 4/8/03
https://novicksoftware.com/UDFofWeek/UDFofWeek.htm
****************************************************************/
AS RETURN
SELECT TOP 1 *
FROM Northwind..Orders
WHERE OrderId > = (
SELECT dbo.udf_Num_RandInt ((SELECT MIN(OrderID)
FROM Northwind..orders)
, (SELECT MAX(orderId)
FROM Northwind..orders))
)
ORDER BY OrderID
GO
GRANT SELECT on [dbo].[udf_Orders_RandomRow] to [PUBLIC]
GO
udf_Orders_RandomRow picks out a random row based on the OrderID. OrderID doesn’t start from one or any predetermined origin so the UDF has to select the minimum and maximum values. That’s actually pretty cool because we’re using sub-queries for the parameters of the function.
Lets take a look at the range of numbers that are in the OrderID column:
, Max(OrderID) as [Max OrderID]
FROM Northwind..Orders
GO
Results:
Min OrderID | Max OrderID |
---|---|
10248 | 11077 |
If you pull out the SELECT on udf_Orders_RandomRow you can retrieve random integers in the range of orderIDs:
FROM Northwind..orders)
, (SELECT MAX(orderId)
FROM Northwind..orders))
GO
Try it a few times and you should always get numbers between 10248 and 11077.
Finally, we can retrieve the random row that we’ve been looking for:
GO
Results(truncated on the right):
OrderID | CustomerID | EmployeeID | OrderDate | RequiredD | |
---|---|---|---|---|---|
10717 | FRANK | 1 | 1997-10-24 | 00:00:00.000 | 1997-11-2 |
Of course, you’ll get a different row each time you use udf_Orders_RandomRow.
The UDF isn’t general purpose. Instead it’s very specific to the Orders table. Due to the limitation on using Dynamic SQL in a UDF and to other limitations it’s not possible to create a general purpose UDF that does the job that udf_Orders_RandomRow takes care of. We have to live with creating a new UDF for each table that we want to use with this technique.