T-SQL Archives - Page 6 of 6

Tag Archive for: T-SQL

Cleaning Data with Recursive CTE

May 15, 2007/1 Comment/in SQL Server, XML/by Plamen Ratchev

SQL Server 2005 added a great new feature: Common Table Expressions (CTE). And even better than that – recursive CTEs. That provides a new powerful tool to solve many SQL problems. One of the areas where recursive CTEs shine is the hierarchical data management.

Here is another side of the recursive CTEs – utilizing them for some common tasks like cleaning data. The problem: a table has a column with values that have invalid characters. The task is to replace all those invalid characters with a space. Unfortunately the REPLACE function does not support pattern matching and each character in the column has to be verified individually and replaced if it falls in the invalid range. The solution below utilizes a recursive CTE to walk though the ACSII table of characters and to replace the invalid characters in the column values.

-- Create test table.
 
CREATE TABLE Foobar (
  key_col INT PRIMARY KEY,
  text_col NVARCHAR(100));
 
-- Populate sample data.
 
INSERT INTO Foobar VALUES (1, N'ABC!@#%DEFgh');
INSERT INTO Foobar VALUES (2, N'~!102WXY&*()_Z');
 
-- Perform the cleanup with recursive CTE.
 
WITH Clean (key_col, text_col, ch) 
AS
(SELECT key_col,
        REPLACE(text_col, CHAR(255), ' '),
        255
 FROM Foobar
 UNION ALL
 SELECT key_col,
        CASE WHEN
             CHAR(ch - 1) NOT LIKE '[A-Z]'
             THEN REPLACE(text_col, CHAR(ch - 1), ' ')
             ELSE text_col END,
        ch - 1
 FROM Clean
 WHERE ch > 1)
SELECT key_col, text_col 
FROM Clean
WHERE ch = 1
OPTION (MAXRECURSION 255);

On a side note – the recursive CTEs are not the best performers. Also, by default a CTE allows only 100 levels of recursion. The MAXRECURSION hint can be used to set higher level (a value between 0 and 32767; setting to 0 will remove the limit). Be aware that settings MAXRECURSION to 0 may create an infinite loop.

Here is a different method using utility table with numbers and FOR XML PATH, which is more effective:

WITH Clean (key_col, text_col)
AS
(SELECT key_col, REPLACE(CAST(
        (SELECT CASE 
                  WHEN SUBSTRING(text_col, n, 1) LIKE '[A-Z]' 
                  THEN SUBSTRING(text_col, n, 1) 
                  ELSE '.' 
                END
         FROM (SELECT number
               FROM master..spt_values
               WHERE type = 'P'
                 AND number BETWEEN 1 AND 100) AS Nums(n)
         WHERE n <= LEN(text_col)
         FOR XML PATH('')) AS NVARCHAR(100)), '.', ' ')
 FROM Foobar)
SELECT key_col, text_col
FROM Clean;

Dates and Date Ranges in SQL Server

April 27, 2007/0 Comments/in SQL Server/by Plamen Ratchev

One of the most common tasks when working with data is to select data for a specific date range or a date. There are two issues that arise: calculating the date range and trimming the time portion in order to select the full days. Below are a few techniques to show how this can be done.

First, a look at the internals of the DATETIME data type in SQL Server. Its internal representation is as two 4-byte values. The first value represents the number of days since 01/01/1900. The second number is the number of milliseconds since midnight. Here is how those two values can be converted to VARBINARY and then to INT to extract the days and milliseconds portions.

-- The internal representation of a datetime is two 4-byte values
SELECT CURRENT_TIMESTAMP AS 'Today',
       CAST(CURRENT_TIMESTAMP AS VARBINARY(8)) AS 'Two 4-byte Internal',
       SUBSTRING(CAST(CURRENT_TIMESTAMP AS VARBINARY(8)), 1 , 4)
           AS 'Days Since 1900-01-01',
       SUBSTRING(CAST(CURRENT_TIMESTAMP AS VARBINARY(8)), 5 , 4)
           AS 'Milliseconds Since Midnight',
       CAST(SUBSTRING(CAST(CURRENT_TIMESTAMP AS VARBINARY(8)), 1 , 4) AS INT)
           AS 'Days Represented as INT',
       CAST(SUBSTRING(CAST(CURRENT_TIMESTAMP AS VARBINARY(8)), 5 , 4) AS INT)
           AS 'Milliseconds Represented as INT';

Below are a few basic date calculations that show the power of the DATETIME functions and how to calculate key dates that can be used to extract data:

-- Basic dates and date calculations
SELECT CAST('19000101 00:00' AS SMALLDATETIME) AS 'Min SMALLDATETIME',
       CAST('20790606 23:59' AS SMALLDATETIME) AS 'Max SMALLDATETIME',
       CAST('1753-01-01T00:00:00.000' AS DATETIME) AS 'Min DATETIME',
       CAST('9999-12-31T23:59:59.997' AS DATETIME) AS 'Max DATETIME',
       CAST(0 AS DATETIME) AS 'Base SQL Server DATETIME',
       CURRENT_TIMESTAMP AS 'Current Date/Time',
       DATEDIFF(DAY, 0, CURRENT_TIMESTAMP)
           AS 'Current Date/Time as Number (days since 1900-01-01)',
       CAST(DATEDIFF(DAY, 0, CURRENT_TIMESTAMP) AS DATETIME)
           AS 'Today as Date',
       CAST(DATEDIFF(DAY, -1, CURRENT_TIMESTAMP) AS DATETIME)
           AS 'Tomorrow as Date',
       CAST(DATEDIFF(DAY, 1, CURRENT_TIMESTAMP) AS DATETIME)
           AS 'Yesterday as Date',
       DATEADD(YEAR, DATEDIFF(YEAR, 0, CURRENT_TIMESTAMP)-1, 0)
           AS 'First Day of Last Year',
       DATEADD(YEAR, DATEDIFF(YEAR, 0, CURRENT_TIMESTAMP), 0)
           AS 'First Day of This Year',
       DATEDIFF(DAY, 0, DATEADD(YEAR, -1, CURRENT_TIMESTAMP))
           AS 'Today One Year Ago as Number',
       CAST(DATEDIFF(DAY, 0, DATEADD(YEAR, -1, CURRENT_TIMESTAMP)) AS DATETIME)
           AS 'Today One Year Ago as Date',
       DATEADD(MONTH, DATEDIFF(MONTH, 0, CURRENT_TIMESTAMP), 0)
           AS 'First Day of Current Month',
       DATEADD(MONTH, DATEDIFF(MONTH, 0, CURRENT_TIMESTAMP) + 1, 0)
           AS 'First Day of Next Month',
       DATEADD(MONTH, DATEDIFF(MONTH, 0, CURRENT_TIMESTAMP) - 1, 0)
           AS 'First Day of Prior Month',
       DATEADD(MONTH, DATEDIFF(MONTH, 0, CURRENT_TIMESTAMP) + 1, 0) - 1
           AS 'Last Day of Current Month',
       DATEADD(DAY, DATEDIFF(DAY, 0, CURRENT_TIMESTAMP), '01:30:00')
           AS '1:30 am today',
       CONVERT(VARCHAR(10), CURRENT_TIMESTAMP, 101) + ' 11:30:00.000'
           AS '11:00 am today',
       CONVERT(VARCHAR(10), CURRENT_TIMESTAMP, 101) + ' 17:00:00.000'
           AS '5:00 pm today',
       DATEADD(WEEK, DATEDIFF(WEEK, 0, CURRENT_TIMESTAMP), 0)
           AS 'First Day of This Week',
       DATEADD(WEEK, DATEDIFF(WEEK, 0, CURRENT_TIMESTAMP) - 1, 0)
           AS 'First Day of Last Week',
       DATEADD(WEEK, DATEDIFF(WEEK, 0, CURRENT_TIMESTAMP) + 1, 0)
           AS 'First Day of Next Week';

Let’s implement some date range searches to see this in action. Given table Orders with DATETIME data type column order_date, here are a few range selections:

-- Get Orders for Last Month
SELECT order_date
FROM Orders
WHERE order_date >= DATEADD(MONTH, 
                      DATEDIFF(MONTH, 0, CURRENT_TIMESTAMP) - 1, 0)
  AND order_date < DATEADD(MONTH, 
                      DATEDIFF(MONTH, 0, CURRENT_TIMESTAMP), 0);
 
-- Get Orders for Current Month
SELECT order_date
FROM Orders
WHERE order_date >= DATEADD(MONTH, 
                      DATEDIFF(MONTH, 0, CURRENT_TIMESTAMP), 0)
  AND order_date < DATEADD(MONTH, 
                      DATEDIFF(MONTH, 0, CURRENT_TIMESTAMP) + 1, 0);
 
-- Get Year to Date Orders
SELECT order_date
FROM Orders
WHERE order_date >= DATEADD(YEAR, 
                      DATEDIFF(YEAR, 0, CURRENT_TIMESTAMP), 0)
  AND order_date < DATEDIFF(DAY, -1, CURRENT_TIMESTAMP);
 
-- Get Month to Date Orders
SELECT order_date
FROM Orders
WHERE order_date >= DATEADD(MONTH, 
                      DATEDIFF(MONTH, 0, CURRENT_TIMESTAMP), 0)
  AND order_date < DATEDIFF(DAY, -1, CURRENT_TIMESTAMP);
 
-- Get Last Year's Orders
SELECT order_date
FROM Orders
WHERE order_date >= DATEADD(YEAR, 
                      DATEDIFF(YEAR, 0, CURRENT_TIMESTAMP)-1, 0)
  AND order_date < DATEADD(YEAR, 
                     DATEDIFF(YEAR, 0, CURRENT_TIMESTAMP), 0);
 
-- Get Today's Orders
SELECT order_date
FROM Orders
WHERE order_date >= DATEDIFF(DAY, 0, CURRENT_TIMESTAMP)
  AND order_date < DATEDIFF(DAY, -1, CURRENT_TIMESTAMP);
 
-- Get Yesterday's Orders
SELECT order_date
FROM Orders
WHERE order_date >= DATEDIFF(DAY, 1, CURRENT_TIMESTAMP)
  AND order_date < DATEDIFF(DAY, 0, CURRENT_TIMESTAMP);
 
-- Get Today's Orders Between 9:00 am And 11:00 am
SELECT order_date
FROM Orders
WHERE order_date BETWEEN 
           CONVERT(VARCHAR(10), CURRENT_TIMESTAMP, 101) + ' 09:00:00.000'
       AND CONVERT(VARCHAR(10), CURRENT_TIMESTAMP, 101) + ' 11:00:00.000';
 
-- Or
SELECT order_date
FROM Orders
WHERE order_date BETWEEN 
           DATEADD(DAY, DATEDIFF(DAY, 0, CURRENT_TIMESTAMP), '09:00:00')
       AND DATEADD(DAY, DATEDIFF(DAY, 0, CURRENT_TIMESTAMP), '11:00:00');
 
-- Get Orders for Last Week
SELECT order_date
FROM Orders
WHERE order_date >= DATEADD(WEEK, 
                      DATEDIFF(WEEK, 0, CURRENT_TIMESTAMP) - 1, 0)
  AND order_date < DATEADD(WEEK, 
                     DATEDIFF(WEEK, 0, CURRENT_TIMESTAMP), 0);
 
-- Get Orders for Current Week
SELECT order_date
FROM Orders
WHERE order_date >= DATEADD(WEEK, 
                      DATEDIFF(WEEK, 0, CURRENT_TIMESTAMP), 0)
  AND order_date < DATEADD(WEEK, 
                     DATEDIFF(WEEK, 0, CURRENT_TIMESTAMP) + 1, 0);
 
-- Get Orders for One Year Back From Current Date
SELECT order_date
FROM Orders
WHERE order_date >= DATEDIFF(DAY, 0, 
                      DATEADD(YEAR, -1, CURRENT_TIMESTAMP))
  AND order_date < DATEDIFF(DAY, -1, CURRENT_TIMESTAMP);

There could be many more variations based on the same techniques to calculate dates. All those techniques use the datetime functions DATEADD and DATEDIFF to trim the time portion of the date in order to get accurate results. The key to this is the behavior of DATEDIFF to calculate difference between boundaries (that is when crossing between years, months, days, minutes, etc.). For example, difference in days between May 3, 2006 at 11:59 PM and May 4, 2006 at 00:01 AM returns one day, even it is only a couple minutes. Similar, calculating difference in years between Dec 31, 2006 and Jan 1, 2007 returns one year, even it is only a day or less. With this in mind it is very easy to accomplish the timing effect. Calculating difference one level higher trims the lower level. For example, to trim second calculate difference in minutes, to trim hours (or time in general) calculate difference in days, difference between months will trim days, and so on.

The same trimming can be done by converting to a string and then back to a date. But that is a slower and lengthy way of doing it. Here is a brief example of both methods:

-- Trim time portion
DECLARE @date DATETIME;
SET @date = CURRENT_TIMESTAMP;
 
-- Convertmg to string and back to date
SELECT CONVERT(DATETIME, CONVERT(CHAR(10), @date, 101)) ;
 
-- Using datetime functions
SELECT DATEADD(DAY, DATEDIFF(DAY, 0, @date),0);

As a bonus, here is how to convert time stored as a decimal number (like 5.25, 5.50, or 5.75) to a time format (like 5:15, 5:30, or 5:45):

-- Format Time From 5.25 to 5:15
-- Decimal Format to Time Format
DECLARE @time DECIMAL(5, 2);
SET @time = 5.25;
 
SELECT CONVERT(CHAR(5), DATEADD(ss, @time * 3600, 0), 108);

Just for clarification: the CURRENT_TIMESTAMP function returns the current date and time. This function is the ANSI SQL equivalent to the GETDATE function.

Pivoting data in SQL Server

April 23, 2007/8 Comments/in SQL Server/by Plamen Ratchev

Very often there is a need to pivot (cross-tab) normalized data for some reporting purposes. While this is best done with reporting tools (Excel is one example with powerful pivoting capabilities), sometimes it needs to be done on the database side.

The discussion here is limited to static pivoting (that is when the values to pivot are known). Dynamic pivoting is accomplished by using dynamic SQL and is based on the same techniques, just dynamically building the pivoting query.

Here are a few methods and techniques to implement that in SQL Server.

Given the following data:

 OrderId    OrderDate  Amount
----------- ---------- ------
1           2007-01-01 10.50
2           2007-01-26 12.50
3           2007-01-30 12.00
4           2007-02-14 13.75
5           2007-02-20 10.00
6           2007-03-06 15.00
7           2007-03-10 17.50
8           2007-03-29 20.00

We would like to return this result set:

 OrderYear   Jan   Feb   Mar
----------- ----- ----- -----
2007        35.00 23.75 52.50

Creating the sample Orders table:

CREATE TABLE Orders (
 order_id INT IDENTITY(1, 1) NOT NULL PRIMARY KEY,
 order_date DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP,
 amount DECIMAL(8, 2) NOT NULL DEFAULT 0
        CHECK (amount >= 0));
 
INSERT INTO Orders
  (order_date, amount)
SELECT '20070101', 10.50
UNION ALL
SELECT '20070126', 12.50
UNION ALL
SELECT '20070130', 12.00
UNION ALL
SELECT '20070214', 13.75
UNION ALL
SELECT '20070220', 10.00
UNION ALL
SELECT '20070306', 15.00
UNION ALL
SELECT '20070310', 17.50
UNION ALL
SELECT '20070329', 20.00;

Using CASE

CASE can be very handy for pivoting. It allows writing flexible logic and to transform the data as needed. Also, it is a very efficient method as it requires a one pass scan of the data.

SELECT DATEPART(yyyy, order_date) AS OrderYear,
       SUM(CASE WHEN DATEPART(m, order_date) = 1
                THEN amount ELSE 0 END) AS 'Jan',
       SUM(CASE WHEN DATEPART(m, order_date) = 2
                THEN amount ELSE 0 END) AS 'Feb',
       SUM(CASE WHEN DATEPART(m, order_date) = 3
                THEN amount ELSE 0 END) AS 'Mar'
FROM Orders
GROUP BY DATEPART(yyyy, order_date);

Using Matrix table

This method requires creating a matrix table with 1 for the significant columns and 0 for columns to be ignored. Then joining the matrix table to the data table based on the pivoting column will produce the results:

CREATE TABLE MonthMatrix (
  month_nbr INT NOT NULL PRIMARY KEY
            CHECK (month_nbr BETWEEN 1 AND 12),
  jan INT NOT NULL DEFAULT 0
      CHECK (jan IN (0, 1)),
  feb INT NOT NULL DEFAULT 0
      CHECK (feb IN (0, 1)),
  mar INT NOT NULL DEFAULT 0
      CHECK (mar IN (0, 1)),
  apr INT NOT NULL DEFAULT 0
      CHECK (apr IN (0, 1)),
  may INT NOT NULL DEFAULT 0
      CHECK (may IN (0, 1)),
  jun INT NOT NULL DEFAULT 0
      CHECK (jun IN (0, 1)),
  jul INT NOT NULL DEFAULT 0
      CHECK (jul IN (0, 1)),
  aug INT NOT NULL DEFAULT 0
      CHECK (aug IN (0, 1)),
  sep INT NOT NULL DEFAULT 0
      CHECK (sep IN (0, 1)),
  oct INT NOT NULL DEFAULT 0
      CHECK (oct IN (0, 1)),
  nov INT NOT NULL DEFAULT 0
      CHECK (nov IN (0, 1)),
  dec INT NOT NULL DEFAULT 0
      CHECK (dec IN (0, 1)));
 
-- Populate the matrix table
INSERT INTO MonthMatrix (month_nbr, jan) VALUES (1, 1);
INSERT INTO MonthMatrix (month_nbr, feb) VALUES (2, 1);
INSERT INTO MonthMatrix (month_nbr, mar) VALUES (3, 1);
INSERT INTO MonthMatrix (month_nbr, apr) VALUES (4, 1);
INSERT INTO MonthMatrix (month_nbr, may) VALUES (5, 1);
INSERT INTO MonthMatrix (month_nbr, jun) VALUES (6, 1);
INSERT INTO MonthMatrix (month_nbr, jul) VALUES (7, 1);
INSERT INTO MonthMatrix (month_nbr, aug) VALUES (8, 1);
INSERT INTO MonthMatrix (month_nbr, sep) VALUES (9, 1);
INSERT INTO MonthMatrix (month_nbr, oct) VALUES (10, 1);
INSERT INTO MonthMatrix (month_nbr, nov) VALUES (11, 1);
INSERT INTO MonthMatrix (month_nbr, dec) VALUES (12, 1);
 
-- Use the matrix table to pivot
SELECT DATEPART(yyyy, order_date) AS OrderYear,
       SUM(amount * jan) AS 'Jan',
       SUM(amount * feb) AS 'Feb',
       SUM(amount * mar) AS 'Mar'
FROM Orders AS O
JOIN MonthMatrix AS M
  ON DATEPART(m, O.order_date) = M.month_nbr
GROUP BY DATEPART(yyyy, order_date);

The David Rozenshtein method

This method is based on the formula: 1 – ABS(SIGN(x – y)). This formula returns 1 when x = y, and 0 otherwise. In a way it mimics the matrix table approach.

SELECT DATEPART(yyyy, order_date) AS OrderYear,
       SUM(amount * (1 - ABS(SIGN(DATEPART(m, order_date) - 1)))) AS 'Jan',
       SUM(amount * (1 - ABS(SIGN(DATEPART(m, order_date) - 2)))) AS 'Feb',
       SUM(amount * (1 - ABS(SIGN(DATEPART(m, order_date) - 3)))) AS 'Mar'
FROM Orders
GROUP BY DATEPART(yyyy, order_date);

The PIVOT operator in SQL Server 2005

SQL Server 2005 provides built-in pivoting mechanism via the PIVOT operator.

SELECT OrderYear,
       [1] AS 'Jan',
       [2] AS 'Feb',
       [3] AS 'Mar'
FROM (SELECT DATEPART(yyyy, order_date),
             DATEPART(m, order_date),
             amount
      FROM Orders) AS O (OrderYear, month_nbr, amount) 
PIVOT
(SUM(amount) FOR month_nbr IN ([1], [2], [3])) AS P;

There are some other methods to implement pivoting (like using subqueries, multiple joins, or the APPLY operator in SQL Server 2005) but I am not showing examples of those as I do not find them practical, especially with large number of values to pivot.

Additional Resources:

Using PIVOT and UNPIVOT
http://msdn2.microsoft.com/en-us/library/ms177410.aspx

Passing a Variable to an IN List

February 28, 2007/7 Comments/in SQL Server/by Plamen Ratchev

Every once in a while there is a need to do something like this:

SELECT person_id, person_name
FROM MyUsers
WHERE person_id IN (@search_list);

And @search_list contains some form of a delimited list. However, this is not a supported syntax and will fail. Here is one solution to this problem:

-- Create the test table
CREATE TABLE MyUsers (
 person_id INT PRIMARY KEY,
 person_name VARCHAR(35));
 
-- Insert sample data
INSERT INTO MyUsers VALUES (1327, 'Joe');
INSERT INTO MyUsers VALUES (1342, 'John F.');
INSERT INTO MyUsers VALUES (1411, 'Mary');
INSERT INTO MyUsers VALUES (1345, 'Nancy');
INSERT INTO MyUsers VALUES (1366, 'Greg');
INSERT INTO MyUsers VALUES (1367, 'Jeff');
INSERT INTO MyUsers VALUES (1368, 'Chris');
INSERT INTO MyUsers VALUES (1369, 'John M.');
INSERT INTO MyUsers VALUES (1370, 'Peggy');
INSERT INTO MyUsers VALUES (1371, 'Samuel');
INSERT INTO MyUsers VALUES (1372, 'Tony');
INSERT INTO MyUsers VALUES (1373, 'Lisa');
INSERT INTO MyUsers VALUES (1374, 'Tom');
INSERT INTO MyUsers VALUES (1375, 'Dave');
INSERT INTO MyUsers VALUES (1376, 'Peter');
INSERT INTO MyUsers VALUES (1377, 'Jason');
INSERT INTO MyUsers VALUES (1378, 'Justin');
INSERT INTO MyUsers VALUES (1379, 'Oscar');
 
DECLARE @search_list VARCHAR(100);
 
DECLARE @delimiter CHAR(1);
 
SELECT @search_list = '1327,1342,1411',
       @delimiter = ',';
 
-- Get the users based on the delimited variable list
SELECT person_id, person_name 
FROM MyUsers
WHERE person_id IN 
     (SELECT SUBSTRING(string, 2, CHARINDEX(@delimiter, string, 2) - 2)
      FROM (SELECT SUBSTRING(list, n, LEN(list))
            FROM (SELECT @delimiter + @search_list + @delimiter) AS L(list),
                 (SELECT ROW_NUMBER() OVER (ORDER BY person_id)
                  FROM MyUsers) AS Nums(n)
            WHERE n <= LEN(list)) AS D(string) 
      WHERE LEN(string) > 1
        AND SUBSTRING(string, 1, 1) = @delimiter)
ORDER BY person_id;

-- Results

person_id person_name
-------- ------------------------------
1327     Joe
1342     John F.
1411     Mary

Notes:

– This algorithm is based on walking the delimited string (could be in a table column or a variable) and parsing it into a table format that can feed the IN list.

– The use of ROW_NUMBER assumes you are using SQL Server 2005. However, you can accomplish the same if you have SQL Server 2000. What this subquery does (SELECT ROW_NUMBER() OVER (ORDER BY person_id) FROM MyUsers) is simply generating a table with numbers. In SQL Server 2000 you can create a temp table and generate sequential numbers. The only requirement is that you have more numbers than the length of the string to parse (in this case the length of ‘1327,1342,1411’). These numbers are used as an index in the walk process.

Convert Tree Structure From Nested Set Into Adjacency List

February 15, 2007/5 Comments/in SQL Server/by Plamen Ratchev

Tree structures are often represented in nested set model or adjacency list model. In the nested set model each node has a left and right, where the root will always have a 1 in its left column and twice the number of nodes in its right column. On the other side the adjacency list model uses a linking column (child/parent) to handle hierarchies.

Sometimes there is a need to convert a nested set model into an adjacency list model. Here is one example of doing that:

CREATE TABLE NestedSet (
 node CHAR(1) NOT NULL PRIMARY KEY,
 lf INT NOT NULL,
 rg INT NOT NULL);
 
INSERT INTO NestedSet VALUES ('A', 1, 8);
INSERT INTO NestedSet VALUES ('B', 2, 3);
INSERT INTO NestedSet VALUES ('C', 4, 7);
INSERT INTO NestedSet VALUES ('D', 5, 6);
 
CREATE TABLE AdjacencyList (
 node CHAR(1) NOT NULL PRIMARY KEY,
 parent CHAR(1) NULL);
 
INSERT INTO AdjacencyList
SELECT A.node,
       B.node AS parent
FROM NestedSet AS A
LEFT OUTER JOIN NestedSet AS B
  ON B.lf = (SELECT MAX(C.lf)
             FROM NestedSet AS C
             WHERE A.lf > C.lf
               AND A.lf < C.rg);

-- Results
node    parent
------  --------
A       NULL
B       A
C       A
D       C

Additional resources:

Book: “Trees and Hierarchies in SQL for Smarties” by Joe Celko

Adjacency List Model
http://www.sqlsummit.com/AdjacencyList.htm

Trees in SQL
http://www.intelligententerprise.com/001020/celko.jhtml?_requestid=1266295

Row Number in SQL Server

January 23, 2007/7 Comments/in SQL Server/by Plamen Ratchev

Every once in a while there is a need to pull the rows with a row number. Here are a few solutions that I have seen to work well.

Below is a sample table with employees that has employee name and employee address columns:

CREATE TABLE Employees (
 employee_name VARCHAR(50) PRIMARY KEY,
 employee_address VARCHAR(100));
 
INSERT INTO Employees (employee_name, employee_address)
VALUES ('Blake Anderson', '2048 River View Rd.');
INSERT INTO Employees (employee_name, employee_address)
VALUES ('Ana Williams', '9055 East Blvd.');
INSERT INTO Employees (employee_name, employee_address)
VALUES ('Robert Schmidt', '3400 Windsor Street');
INSERT INTO Employees (employee_name, employee_address)
VALUES ('Sarah Reese', '1045 Coral Rd.');

SQL Server 2000 and SQL Server 2005

Using an IDENTITY column and a temporary table

This solution is based on creating a temporary table with IDENTITY column used to provide a row number. This approach provides very good performance. Here are the steps:

-- Create the temp table
CREATE TABLE #EmployeeRowNumber (
 rn INT IDENTITY (1, 1),
 employee_name VARCHAR(50),
 employee_address VARCHAR(100));
 
-- Generate the row number
-- To achieve an ordered list the names are sorted
INSERT #EmployeeRowNumber (employee_name, employee_address)
SELECT employee_name, employee_address
FROM Employees
ORDER BY employee_name;
 
-- Select the row number
SELECT employee_name, employee_address, rn
FROM #EmployeeRowNumber
ORDER BY rn;

  
-- Results

employee_name    employee_address         rn
---------------- ------------------------ -----------
Ana Williams     9055 East Blvd.          1
Blake Anderson   2048 River View Rd.      2
Robert Schmidt   3400 Windsor Street      3
Sarah Reese      1045 Coral Rd.           4

Using a subquery to count the number of rows

This solution is based on using a subquery on a unique column (or combination of columns) to count the number of rows. Here is how it looks with the sample data provided above:

SELECT employee_name, employee_address,
      (SELECT COUNT(*) 
       FROM Employees AS E2
       WHERE E2.employee_name <= E1.employee_name) AS rn  
FROM Employees AS E1  
ORDER BY employee_name;

If the values in the column are not unique then duplicate row numbers will be generated. That can be resolved by adding a tiebreaker column that will guarantee the uniqueness. This approach is a slower method than using an IDENTITY column and a temporary table. Since it will incur (n + n²) /2 row scans it may not be practical to use on a large table.

SQL Server 2005

Using the ROW_NUMBER function

In SQL Server 2005 the new function ROW_NUMBER provides the fastest approach to solve the problem:

SELECT employee_name, employee_address,
       ROW_NUMBER() OVER(ORDER BY employee_name) AS rn
FROM Employees
ORDER BY employee_name;

Additional resources:

How to dynamically number rows in a SELECT Transact-SQL statement: http://support.microsoft.com/kb/186133

SQL Server 2005 Ranking Functions: http://msdn2.microsoft.com/en-us/library/ms189798.aspx

Book: “Inside Microsoft SQL Server 2005: T-SQL Querying” by Itzik Ben-Gan, Lubor Kollar and Dejan Sarka